Losing sequences from names file with remove.groups

Hello,

I’m trying to split my sequences into two groups (from two experiments) using the remove.groups command. Here’s my input:

remove.groups(fasta=L3.final.fasta, group=L3.final.groups, name=L3.final.names, groups=Sample01-Sample02-Sample03-Sample04-Sample05-Sample06-Sample07-Sample08-Sample09-Sample10-Sample11-Sample12-Sample13-Sample14-Sample15-Sample16-Sample17-Sample18-Sample19-Sample20-Sample21-Sample22)

And mothur’s output:
Removed 32037317 sequences from your name file.
Removed 199005 sequences from your fasta file.
Removed 32037317 sequences from your group file.

Running summary.seqs and count.groups, however, reveals a problem:

of unique seqs: 64513

total # of seqs: 8378246

mothur > count.groups(group=L3.final.pick.groups)
Sample70 contains 390385.
Sample71 contains 153851.
Sample72 contains 404284.
Sample73 contains 505864.
Sample74 contains 2379091.
Sample75 contains 533678.
Sample76 contains 1120613.
Sample77 contains 1393437.
Sample78 contains 646127.
Sample79 contains 2775750.
Sample80 contains 723047.

If you add up the number of sequences from count.groups, you get 11026127…about 3 million more sequences than the summary says I have in the names file. Furthermore, the numbers from count.groups match the numbers from these groups before running remove.groups - so it seems like there’s something wrong with the names file after running remove.groups, like it’s removed quite a few sequences that should have been left in. If I run remove.groups and remove samples 70-80 instead, the same thing happens: there’s the right number of sequences in the groups file, but too few in the names file. (If I remove both and run summary.seqs on the two separated groups and the original fasta, I get the right number of uniques, so the fasta seems to be okay.) The number of sequences in the names file matches the total number of sequences in the groups file before this step, so everything seems okay until I run remove.groups.

I’ve also tried running get.groups instead, and get the same results. Any idea what could be causing this, or how to fix it? Thank you very much!

I am not able to reproduce the problem you are having with the SOP files using our current version. What version are you using? If you are using 1.24.0, and want me to try and troubleshoot the issue for you, you can send your files to mothur.bugs@gmail.com.

mothur > get.groups(outputdir=…/getgroupsbug, fasta=final.fasta, group=final.groups, name=final.names, groups=F003D000-F003D002-F003D004-F003D006-F003D008)
Setting output directory to: /Users/SarahsWork/Desktop/getgroupsbug/
Unable to open final.fasta. Trying default /Users/SarahsWork/Desktop/release/final.fasta
Unable to open final.names. Trying default /Users/SarahsWork/Desktop/release/final.names
Unable to open final.groups. Trying default /Users/SarahsWork/Desktop/release/final.groups
Selected 25304 sequences from your name file.
Selected 1425 sequences from your fasta file.
Selected 25304 sequences from your group file.

Output File names:
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.names
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.groups


mothur > count.groups() Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.groups as input file for the group parameter. F003D000 contains 5124. F003D002 contains 4745. F003D004 contains 4601. F003D006 contains 5778. F003D008 contains 5056.

mothur > summary.seqs(name=current)
Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.names as input file for the name parameter.
Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta as input file for the fasta parameter.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 445 241 0 4 1
2.5%-tile: 1 445 247 0 4 633
25%-tile: 1 445 249 0 4 6327
Median: 1 445 251 0 4 12653
75%-tile: 1 445 251 0 4 18979
97.5%-tile: 1 445 251 0 6 24672
Maximum: 3 445 257 0 7 25304
Mean: 1.00079 445 250.029 0 4.32714

of unique seqs: 1425

total # of seqs: 25304

Output File Name:
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta.summary


mothur > remove.groups(outputdir=../getgroupsbug, fasta=final.fasta, group=final.groups, name=final.names, groups=F003D000-F003D002-F003D004-F003D006-F003D008) Setting output directory to: /Users/SarahsWork/Desktop/getgroupsbug/ Unable to open final.fasta. Trying default /Users/SarahsWork/Desktop/release/final.fasta Unable to open final.names. Trying default /Users/SarahsWork/Desktop/release/final.names Unable to open final.groups. Trying default /Users/SarahsWork/Desktop/release/final.groups Removed 25304 sequences from your name file. Removed 1185 sequences from your fasta file. Removed 25304 sequences from your group file.

Output File names:
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.names
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.groups


mothur > count.groups() Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.groups as input file for the group parameter. F003D142 contains 5941. F003D144 contains 4841. F003D146 contains 5708. F003D148 contains 4795. F003D150 contains 5896. MOCK.GQY1XT001 contains 6077.

mothur > summary.seqs(name=current)
Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.names as input file for the name parameter.
Using /Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta as input file for the fasta parameter.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 445 241 0 3 1
2.5%-tile: 1 445 247 0 4 832
25%-tile: 1 445 250 0 4 8315
Median: 1 445 251 0 4 16630
75%-tile: 1 445 251 0 4 24944
97.5%-tile: 1 445 251 0 6 32427
Maximum: 3 445 257 0 8 33258
Mean: 1.00108 445 250.142 0 4.34286

of unique seqs: 1407

total # of seqs: 33258

Output File Name:
/Users/SarahsWork/Desktop/getgroupsbug/final.pick.fasta.summary


mothur > quit()

I’m sorry, what I wrote was unclear. I’m working with 16S sequences from an Illumina run, not files from the SOP. However, I may be using an older version of mothur - was this a problem that was fixed in 1.24?

Thank you very much!

We fixed a similar bug in version 1.23.0. Are you using an older version than that?

Yes, I believe I’m running 1.22. Sounds like that might be the problem! Thank you very much!