mothur

Mothur remove all sequence from all groups

#1

I am not sure whether this is a bug, a code error on my part or a problem with the sequences, but upon running the command screen.seqs mothur ends up removing all the sequences from all my groups. I am pasting the log file output. What am I doing wrong? I am by no means an expert on this so any help will be much appreciated! (this was done on version 1.40.5)

Windows version

Running 64Bit Version

mothur v.1.40.5
Last updated: 06/19/2018
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://www.mothur.org/forum

Type 'quit()' to exit program

Interactive Mode



mothur > 
set.current(processors=2)

Using 2 processors.

Current files saved by mothur:
processors=2

Output File Names: 
current_files.summary


mothur > 
trim.seqs(fasta=022018JLR799F_full.fasta, qfile=022018JLR799F_full.qual, qaverage=25, oligos=022018JLR799F.oligos, pdiffs=2, bdiffs=2) #remove bad quality sequences

Using 2 processors.
It took 1037 secs to trim 3172122 sequences.

Group count: 
FY1	158765
FY10	54654
FY11	42572
FY12	55120
FY13	80105
FY14	76652
FY15	92484
FY16	119454
FY17	91767
FY18	102996
FY19	115765
FY2	136254
FY20	120967
FY21	93268
FY22	78479
FY23	49173
FY24	68671
FY25	77354
FY26	120844
FY27	77314
FY28	81887
FY29	75021
FY3	118030
FY30	85722
FY31	51704
FY32	42009
FY33	44813
FY34	146708
FY35	112787
FY36	105011
FY4	109096
FY5	87765
FY6	78356
FY7	100061
FY8	90265
FY9	28040
Total of all groups is 3169933

Output File Names: 
022018JLR799F_full.trim.fasta
022018JLR799F_full.scrap.fasta
022018JLR799F_full.trim.qual
022018JLR799F_full.scrap.qual
022018JLR799F_full.groups


mothur > 
summary.seqs(fasta=current)
Using 022018JLR799F_full.trim.fasta as input file for the fasta parameter.

Using 2 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	8	8	0	2	1
2.5%-tile:	1	375	375	0	4	79249
25%-tile:	1	390	390	0	5	792484
Median: 	1	391	391	0	5	1584967
75%-tile:	1	393	393	0	5	2377450
97.5%-tile:	1	400	400	0	6	3090685
Maximum:	1	543	543	31	142	3169933
Mean:	1	389	389	0	5
# of Seqs:	3169933

It took 133 secs to summarize 3169933 sequences.

Output File Names:
 022018JLR799F_full.trim.summary


mothur > 
screen.seqs(fasta=current, group=current, maxambig=0, optimize=minlength-maxlength, maxhomop=6, criteria=90) #remove too long/short sequences
Using 022018JLR799F_full.trim.fasta as input file for the fasta parameter.
Using 022018JLR799F_full.groups as input file for the group parameter.

Using 2 processors.
Optimizing minlength to 386.
Optimizing maxlength to 394.

It took 44 secs to screen 3169933 sequences, removed 523707.

/******************************************/
Running command: remove.seqs(accnos=022018JLR799F_full.trim.bad.accnos, group=022018JLR799F_full.groups)
Removed 523707 sequences from your group file.

Output File Names: 
022018JLR799F_full.pick.groups

/******************************************/

Output File Names: 
022018JLR799F_full.trim.good.fasta
022018JLR799F_full.trim.bad.accnos
022018JLR799F_full.good.groups


It took 128 secs to screen 3169933 sequences.

mothur > 
summary.seqs(fasta=current)
Using 022018JLR799F_full.trim.good.fasta as input file for the fasta parameter.

Using 2 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	386	386	0	3	1
2.5%-tile:	1	386	386	0	4	66156
25%-tile:	1	390	390	0	5	661557
Median: 	1	391	391	0	5	1323114
75%-tile:	1	392	392	0	5	1984670
97.5%-tile:	1	394	394	0	6	2580071
Maximum:	1	394	394	0	6	2646226
Mean:	1	390	390	0	5
# of Seqs:	2646226

It took 117 secs to summarize 2646226 sequences.

Output File Names:
 022018JLR799F_full.trim.good.summary


mothur > 
unique.seqs(fasta=current) #makes operation faster, since it ignores duplicate/identical sequence for calculation purposes
Using 022018JLR799F_full.trim.good.fasta as input file for the fasta parameter.
2646226	1678409

Output File Names: 
022018JLR799F_full.trim.good.names
022018JLR799F_full.trim.good.unique.fasta


mothur > 
count.seqs(name=current, group=current)
Using 022018JLR799F_full.good.groups as input file for the group parameter.
Using 022018JLR799F_full.trim.good.names as input file for the name parameter.

It took 51 secs to create a table for 2646226 sequences.

Total number of sequences: 2646226

Output File Names: 
022018JLR799F_full.trim.good.count_table


mothur > 
summary.seqs(count=current)
Using 022018JLR799F_full.trim.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.fasta as input file for the fasta parameter.

Using 2 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	386	386	0	3	1
2.5%-tile:	1	386	386	0	4	66156
25%-tile:	1	390	390	0	5	661557
Median: 	1	391	391	0	5	1323114
75%-tile:	1	392	392	0	5	1984670
97.5%-tile:	1	394	394	0	6	2580071
Maximum:	1	394	394	0	6	2646226
Mean:	1	390	390	0	5
# of unique seqs:	1678409
total # of seqs:	2646226

It took 51 secs to summarize 2646226 sequences.

Output File Names:
 022018JLR799F_full.trim.good.unique.summary


mothur > 
align.seqs(fasta=current, reference=silva.bacteria.fasta) #align sequences to the reference database, but use the correct one.
Using 022018JLR799F_full.trim.good.unique.fasta as input file for the fasta parameter.

Using 2 processors.

Reading in the silva.bacteria.fasta template sequences...	DONE.
It took 24 to read  14956 sequences.
Aligning sequences from 022018JLR799F_full.trim.good.unique.fasta ...
It took 8643 secs to align 1678409 sequences.

[WARNING]: 33 of your sequences generated alignments that eliminated too many bases, a list is provided in 022018JLR799F_full.trim.good.unique.flip.accnos.
[NOTE]: 31 of your sequences were reversed to produce a better alignment.

Output File Names: 
022018JLR799F_full.trim.good.unique.align
022018JLR799F_full.trim.good.unique.align.report
022018JLR799F_full.trim.good.unique.flip.accnos


mothur > 
summary.seqs(fasta=current, count=current)
Using 022018JLR799F_full.trim.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.align as input file for the fasta parameter.

Using 2 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1044	1058	7	0	3	1
2.5%-tile:	25292	37810	386	0	4	66156
25%-tile:	25292	37810	390	0	5	661557
Median: 	25292	37811	391	0	5	1323114
75%-tile:	25292	37811	392	0	5	1984670
97.5%-tile:	25293	37811	394	0	6	2580071
Maximum:	25503	39358	394	0	6	2646226
Mean:	25291	37810	390	0	5
# of unique seqs:	1678409
total # of seqs:	2646226

It took 4653 secs to summarize 2646226 sequences.

Output File Names:
 022018JLR799F_full.trim.good.unique.summary


mothur > 
screen.seqs(fasta=current, count=current, summary=current, optimize=start-end-minlength)
Using 022018JLR799F_full.trim.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.align as input file for the fasta parameter.
Using 022018JLR799F_full.trim.good.unique.summary as input file for the summary parameter.

Using 2 processors.
Optimizing start to 25292.
Optimizing end to 37810.
Optimizing minlength to 388.

It took 5554 secs to screen 1678409 sequences, removed 236446.

/******************************************/
Running command: remove.seqs(accnos=022018JLR799F_full.trim.good.unique.bad.accnos, count=022018JLR799F_full.trim.good.count_table)

Removing group: FY10 because all sequences have been removed.

Removing group: FY11 because all sequences have been removed.

Removing group: FY12 because all sequences have been removed.

Removing group: FY13 because all sequences have been removed.

Removing group: FY14 because all sequences have been removed.

Removing group: FY15 because all sequences have been removed.

Removing group: FY16 because all sequences have been removed.

Removing group: FY17 because all sequences have been removed.

Removing group: FY18 because all sequences have been removed.

Removing group: FY19 because all sequences have been removed.

Removing group: FY2 because all sequences have been removed.

Removing group: FY20 because all sequences have been removed.

Removing group: FY21 because all sequences have been removed.

Removing group: FY22 because all sequences have been removed.

Removing group: FY23 because all sequences have been removed.

Removing group: FY24 because all sequences have been removed.

Removing group: FY25 because all sequences have been removed.

Removing group: FY26 because all sequences have been removed.

Removing group: FY27 because all sequences have been removed.

Removing group: FY28 because all sequences have been removed.

Removing group: FY29 because all sequences have been removed.

Removing group: FY3 because all sequences have been removed.

Removing group: FY31 because all sequences have been removed.

Removing group: FY32 because all sequences have been removed.

Removing group: FY33 because all sequences have been removed.

Removing group: FY34 because all sequences have been removed.

Removing group: FY35 because all sequences have been removed.

Removing group: FY36 because all sequences have been removed.

Removing group: FY4 because all sequences have been removed.

Removing group: FY5 because all sequences have been removed.

Removing group: FY6 because all sequences have been removed.

Removing group: FY7 because all sequences have been removed.

Removing group: FY8 because all sequences have been removed.

Removing group: FY9 because all sequences have been removed.
Removed 263754 sequences from your count file.

Output File Names: 
022018JLR799F_full.trim.good.pick.count_table

/******************************************/

Output File Names: 
022018JLR799F_full.trim.good.unique.good.summary
022018JLR799F_full.trim.good.unique.good.align
022018JLR799F_full.trim.good.unique.bad.accnos
022018JLR799F_full.trim.good.good.count_table


It took 6717 secs to screen 1678409 sequences.

mothur > 
summary.seqs(fasta=current, count=current) 
Using 022018JLR799F_full.trim.good.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.good.align as input file for the fasta parameter.

Using 2 processors.
[ERROR]: 'M02542_54_000000000-BL95D_1_1104_26786_19902' is not in your name or count file, please correct.
[ERROR]: 'M02542_54_000000000-BL95D_1_2115_18215_15631' is not in your name or count file, please correct.
#2

You are probably trying to optimize too many things in screen.seqs. I’d suggest doing…

screen.seqs(fasta=current, count=current, summary=current, start=25293, end=37810)
#3

My initial attempts had the code as “start=25293, end=37811”, and it still ended up removing everything. Only after that did I try to optimize. So I will try your suggestion and see what happens…

#4

Okay, so I tried that suggestion and the step worked, but interestingly I now run into a new problem, i.e. that certain sequences are not in my count file, and there mothur bombs out again…

mothur >
screen.seqs(fasta=current, count=current, summary=current, start=25293, end=37810)
Using 022018JLR799F_full.trim.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.align as input file for the fasta parameter.
Using 022018JLR799F_full.trim.good.unique.summary as input file for the summary parameter.

Using 2 processors.

It took 4803 secs to screen 1678409 sequences, removed 36715.

/******************************************/
Running command: remove.seqs(accnos=022018JLR799F_full.trim.good.unique.bad.accnos, count=022018JLR799F_full.trim.good.count_table)
Removed 24621 sequences from your count file.

Output File Names:
022018JLR799F_full.trim.good.pick.count_table

/******************************************/

Output File Names:
022018JLR799F_full.trim.good.unique.good.summary
022018JLR799F_full.trim.good.unique.good.align
022018JLR799F_full.trim.good.unique.bad.accnos
022018JLR799F_full.trim.good.good.count_table

It took 5950 secs to screen 1678409 sequences.

mothur >
summary.seqs(fasta=current, count=current)
Using 022018JLR799F_full.trim.good.good.count_table as input file for the count parameter.
Using 022018JLR799F_full.trim.good.unique.good.align as input file for the fasta parameter.

Using 2 processors.
[ERROR]: ‘M02542_54_000000000-BL95D_1_2119_28862_15219’ is not in your name or count file, please correct.
[ERROR]: ‘M02542_54_000000000-BL95D_1_1107_14029_9583’ is not in your name or count file, please correct.

#5

I’d suggest going back a few steps and running them again. The files/commands seem to be out of sync

Pat

closed #6

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.