Pre.cluster removed all sequences

SANTA-CATALINA · June 21, 2022, 6:25am

Hi everybody

I work with mothur 1.48.0, when I run the pre.clusters command, all my sequences are removed.

My script :

make.contigs(file=fastq.files.table, oligos=miseq.LBE.16S.515_928.oligos.table, pdiffs=0, maxambig=0, maxhomop=8, processors=20) ## 220620 rename=T no function mothur 148 , moved up maxhomop=8 from screen.seqs below for consistency
rename.seqs(fasta=current)## 220620 removed group=current,
summary.seqs(fasta=current)

unique.seqs(fasta=current)
count.seqs(count=current)
summary.seqs(count=current)

align.seqs(fasta=current, reference=generic.LBE.16S.515_928.database.align)
summary.seqs(fasta=current, count=current)

screen.seqs(fasta=current, count=current, summary=current, start=180, end=13978)
summary.seqs(fasta=current, count=current)

filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)

pre.cluster(fasta=fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta, count=fastq.files.trim.contigs.renamed.unique.good.filter.count_table, diffs=4, processors=1)

Logfile :

mothur > make.contigs(file=fastq.files.table, oligos=miseq.LBE.16S.515_928.oligos.table, pdiffs=0,  maxambig=0, maxhomop=8, processors=20) ## 220620 rename=T no function mothur 148 , moved up maxhomop=8 from screen.seqs below for consistency

Using 20 processors.

>>>>>	Processing file pair ABwet_EN_1_24_05_OJ_ErT_GG_V4&GGACGG&L001&R1.fastq - ABwet_EN_1_24_05_OJ_ErT_GG_V4&GGACGG&L001&R2.fastq (files 1 of 2)	<<<<<
[WARNING]: your oligos file does not contain any group names.  mothur will not create a groupfile.
Making contigs...
Done.

It took 4 secs to assemble 50815 reads.


>>>>>	Processing file pair ABwet_En2_24_05_OJ_ErT_GG_V4&AGAGGG&L001&R1.fastq - ABwet_En2_24_05_OJ_ErT_GG_V4&AGAGGG&L001&R2.fastq (files 2 of 2)	<<<<<
[WARNING]: your oligos file does not contain any group names.  mothur will not create a groupfile.
Making contigs...
Done.

It took 4 secs to assemble 44531 reads.


Group count: 
ABwet_EN_1_24_05_OJ_ErT_GG_V4	39119
ABwet_En2_24_05_OJ_ErT_GG_V4	33549

Total of all groups is 72668

It took 8 secs to process 95346 sequences.

Output File Names: 
fastq.files.trim.contigs.fasta
fastq.files.scrap.contigs.fasta
fastq.files.contigs_report
fastq.files.contigs.count_table


mothur > rename.seqs(fasta=current)## 220620 removed group=current,
Using fastq.files.trim.contigs.fasta as input file for the fasta parameter.

Output File Names:
fastq.files.trim.contigs.renamed.fasta
fastq.files.trim.contigs.renamed_map


mothur > summary.seqs(fasta=current)
Using fastq.files.trim.contigs.renamed.fasta as input file for the fasta parameter.

Using 20 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	308	308	0	3	1
2.5%-tile:	1	373	373	0	4	1817
25%-tile:	1	375	375	0	4	18168
Median: 	1	376	376	0	5	36335
75%-tile:	1	377	377	0	5	54502
97.5%-tile:	1	377	377	0	6	70852
Maximum:	1	461	461	0	8	72668
Mean:	1	375	375	0	4
# of Seqs:	72668

It took 1 secs to summarize 72668 sequences.

Output File Names:
fastq.files.trim.contigs.renamed.summary


mothur > unique.seqs(fasta=current)
Using fastq.files.trim.contigs.renamed.fasta as input file for the fasta parameter.
72668	25475

Output File Names: 
fastq.files.trim.contigs.renamed.unique.fasta
fastq.files.trim.contigs.renamed.count_table


mothur > count.seqs(count=current) ##220620 removed name=current
Using fastq.files.trim.contigs.renamed.count_table as input file for the count parameter.

Output File Names: 
fastq.files.trim.contigs.renamed.sparse.count_table


mothur > summary.seqs(count=current)
Using fastq.files.trim.contigs.renamed.sparse.count_table as input file for the count parameter.
Using fastq.files.trim.contigs.renamed.unique.fasta as input file for the fasta parameter.

Using 20 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	308	308	0	3	1
2.5%-tile:	1	373	373	0	4	1817
25%-tile:	1	375	375	0	4	18168
Median: 	1	376	376	0	5	36335
75%-tile:	1	377	377	0	5	54502
97.5%-tile:	1	377	377	0	6	70852
Maximum:	1	461	461	0	8	72668
Mean:	1	375	375	0	4
# of unique seqs:	25475
total # of seqs:	72668

It took 0 secs to summarize 72668 sequences.

Output File Names:
fastq.files.trim.contigs.renamed.unique.summary


mothur > align.seqs(fasta=current, reference=generic.LBE.16S.515_928.database.align)
Using fastq.files.trim.contigs.renamed.unique.fasta as input file for the fasta parameter.
Unable to open generic.LBE.16S.515_928.database.align. Trying MOTHUR_FILES directory /mnt/sequencing/mothurdb/generic.LBE.16S.515_928.database.align.

Using 20 processors.

Reading in the /mnt/sequencing/mothurdb/generic.LBE.16S.515_928.database.align template sequences...	DONE.
It took 48 to read  213119 sequences.

Aligning sequences from fastq.files.trim.contigs.renamed.unique.fasta ...
It took 40 secs to align 25475 sequences.


It took 41 seconds to align 25475 sequences.

Output File Names: 
fastq.files.trim.contigs.renamed.unique.align
fastq.files.trim.contigs.renamed.unique.align_report


mothur > summary.seqs(fasta=current, count=current)
Using fastq.files.trim.contigs.renamed.sparse.count_table as input file for the count parameter.
Using fastq.files.trim.contigs.renamed.unique.align as input file for the fasta parameter.

Using 20 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	173	5790	231	0	3	1
2.5%-tile:	180	13978	373	0	4	1817
25%-tile:	180	13978	375	0	4	18168
Median: 	180	13978	376	0	5	36335
75%-tile:	180	13978	377	0	5	54502
97.5%-tile:	180	13978	377	0	6	70852
Maximum:	8110	14744	459	0	8	72668
Mean:	180	13976	375	0	4
# of unique seqs:	25475
total # of seqs:	72668

It took 0 secs to summarize 72668 sequences.

Output File Names:
fastq.files.trim.contigs.renamed.unique.summary


mothur > screen.seqs(fasta=current, count=current, summary=current, start=180, end=13978)
Using fastq.files.trim.contigs.renamed.sparse.count_table as input file for the count parameter.
Using fastq.files.trim.contigs.renamed.unique.align as input file for the fasta parameter.
Using fastq.files.trim.contigs.renamed.unique.summary as input file for the summary parameter.

Using 20 processors.

It took 2 secs to screen 25475 sequences, removed 92.

/******************************************/
Running command: remove.seqs(accnos=fastq.files.trim.contigs.renamed.unique.bad.accnos.temp, count=fastq.files.trim.contigs.renamed.sparse.count_table)
Removed 118 sequences from fastq.files.trim.contigs.renamed.sparse.count_table.

Output File Names:
fastq.files.trim.contigs.renamed.sparse.pick.count_table

/******************************************/

Output File Names:
fastq.files.trim.contigs.renamed.unique.good.summary
fastq.files.trim.contigs.renamed.unique.good.align
fastq.files.trim.contigs.renamed.unique.bad.accnos
fastq.files.trim.contigs.renamed.sparse.good.count_table


It took 2 secs to screen 25475 sequences.

mothur > summary.seqs(fasta=current, count=current)
Using fastq.files.trim.contigs.renamed.sparse.good.count_table as input file for the count parameter.
Using fastq.files.trim.contigs.renamed.unique.good.align as input file for the fasta parameter.

Using 20 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	173	13978	362	0	3	1
2.5%-tile:	180	13978	373	0	4	1814
25%-tile:	180	13978	375	0	4	18138
Median: 	180	13978	376	0	5	36276
75%-tile:	180	13978	377	0	5	54413
97.5%-tile:	180	13978	377	0	6	70737
Maximum:	180	14744	381	0	8	72550
Mean:	179	13978	375	0	4
# of unique seqs:	25383
total # of seqs:	72550

It took 1 secs to summarize 72550 sequences.

Output File Names:
fastq.files.trim.contigs.renamed.unique.good.summary


mothur > filter.seqs(fasta=current, vertical=T, trump=.)
Using fastq.files.trim.contigs.renamed.unique.good.align as input file for the fasta parameter.

Using 20 processors.
Creating Filter...
It took 1 secs to create filter for 25383 sequences.


Running Filter...
It took 0 secs to filter 25383 sequences.



Length of filtered alignment: 542
Number of columns removed: 14202
Length of the original alignment: 14744
Number of sequences used to construct filter: 25383

Output File Names: 
fastq.filter
fastq.files.trim.contigs.renamed.unique.good.filter.fasta


mothur > unique.seqs(fasta=current, count=current)
Using fastq.files.trim.contigs.renamed.sparse.good.count_table as input file for the count parameter.
Using fastq.files.trim.contigs.renamed.unique.good.filter.fasta as input file for the fasta parameter.
25383	25383

Output File Names: 
fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta
fastq.files.trim.contigs.renamed.unique.good.filter.count_table


mothur > pre.cluster(fasta=fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta, count=fastq.files.trim.contigs.renamed.unique.good.filter.count_table, diffs=4, processors=1)

Using 1 processors.
25383	3027	22356
Total number of sequences before precluster was 25383.
pre.cluster removed 22356 sequences.

/******************************************/
[WARNING]: fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta.

Output File Names:
fastq.files.trim.contigs.renamed.unique.good.filter.unique.precluster.fasta

I don’t understand why all the sequences are removed

Thank you for your help

pschloss · June 21, 2022, 8:11pm

I’m not 100% what’s going on but your script is leaving out the count file at various places that it needs to be included. I’m surprised you aren’t getting other error messages earlier in the pipeline. 1.48.0 outputs a count file from make.contigs that you need to know which sequences belong to each group. You are leaving it out from your rename.seqs and unique.seqs commands. Maybe give this a try and see if it resolves the issues…

make.contigs(file=fastq.files.table, oligos=miseq.LBE.16S.515_928.oligos.table, pdiffs=0, maxambig=0, maxhomop=8, processors=20)
rename.seqs(fasta=current, count=current)
summary.seqs(fasta=current, count=current)

unique.seqs(fasta=current, count=current)
summary.seqs(count=current)

align.seqs(fasta=current, reference=generic.LBE.16S.515_928.database.align)
summary.seqs(fasta=current, count=current)

screen.seqs(fasta=current, count=current, summary=current, start=180, end=13978)
summary.seqs(fasta=current, count=current)

filter.seqs(fasta=current, vertical=T, trump=.)
unique.seqs(fasta=current, count=current)

pre.cluster(fasta=fastq.files.trim.contigs.renamed.unique.good.filter.unique.fasta, count=fastq.files.trim.contigs.renamed.unique.good.filter.count_table, diffs=4, processors=1)

SANTA-CATALINA · June 23, 2022, 10:23am

Thank you for your help.
The problem is solved

system · July 3, 2022, 10:24am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pre.cluster taking longer than usual and eliminating 90% of sequences Commands in mothur	4	325	September 4, 2022
pre.cluster bugs mothur bugs	3	1026	September 19, 2017
HELP Pre.Cluster Not Working Commands in mothur	2	549	August 14, 2022
Problem with MiSeq SOP pre.cluster Commands in mothur	2	2984	May 22, 2015
pre.cluster removes >90 % of sequences? Commands in mothur	3	1216	August 9, 2016

Pre.cluster removed all sequences

Related topics