get.groups

Hi,

I applied the get.groups command to get a subset of my samples. It states that 51970 sequences have been picked from the names file and the group file. This is true for the group file indeed. I just opened it in a text editor and checked the line count. Unfortunately the summary. seqs command reveals that just 51312 sequences have been picked from the names file. This incompatibility is causing me troubles in the down stream analysis
I am running 64bit linux version 1.22.2. Tried 1.21.1 under windows as well. Same result. Any ideas?

Cheers,

mothur > get.groups(fasta=joint.unique.precluster.pick.filter.good.good.pick.fasta,group=joint.pick.good.good.pick.groups,name=joint.unique.precluster.pick.good.good.pick.names,groups=m007-s007-mby-sby-m010-s010-m014-s014-mple-sple-mpea-spea-mco-sco)

Selected 51970 sequences from your name file.
Selected 5989 sequences from your fasta file.

Selected 51970 sequences from your group file.

Output File names: 
joint.unique.precluster.pick.good.good.pick.pick.names
joint.unique.precluster.pick.filter.good.good.pick.pick.fasta
joint.pick.good.good.pick.pick.groups

mothur > summary.seqs(fasta=current,name=current)

Using joint.unique.precluster.pick.filter.good.good.pick.pick.fasta as input file for the fasta parameter.
Using joint.unique.precluster.pick.good.good.pick.pick.names as input file for the name parameter.

  Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 868 241 0 3 1
2.5%-tile: 1 868 280 0 4 1283
25%-tile: 1 868 280 0 5 12829
Median:  1 868 280 0 5 25657
75%-tile: 1 868 304 0 5 38485
97.5%-tile: 1 868 328 0 5 50030
Maximum: 1 869 346 0 6 51312
Mean: 1 868 289.366 0 4.97418
# of unique seqs: 5989
total # of seqs: 51312

Could you send your fasta, name and group files to mothur.bugs@gmail.com?

Hi,
i had exactly the same problem, have you got any ideas what can be done? Or alternatives?
much appreciated,
cheers

Just to follow up from my previous email. I’m using mothur version 122.0.

As Fabio mentioned I’m also interested in getting groups from my big group file:
mothur > get.groups(group=all_sample.groups, fasta=all_sample.fasta, groups=SEC1-SEC2-SEC3-SEC4-SEC5-SEL1, name=all-sample.names)

Selected 377439 sequences from your name file.
Selected 251691 sequences from your fasta file.
Selected 377439 sequences from your group file.

Output files:
final.soils.fasta
final.soils.groups
final.soils.names

mothur > summary.seqs(fasta=final.soils.fasta, name=final.soils.names)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 150 150 0 4 1
2.5%-tile: 1 198 198 0 4 9059
25%-tile: 1 354 354 0 4 90582
Median: 1 359 359 0 4 181163
75%-tile: 1 361 361 0 4 271744
97.5%-tile: 1 380 380 0 5 353266
Maximum: 1 398 398 0 6 362324
Mean: 1 345.802 345.802 0 4.25125

of unique seqs: 251691

total # of seqs: 362324

so clearly, numbers don’t add up for the names file.

I also tried using list.seqs to get accnos file (a list file with names, soil.accnos) from my subsample groups file (final.soils.groups) and when i check the list file separately i get correct number of seqs. However then i run get. seqs using the accnos file i just created and big name file (all_sample.names) to retrieve subsample names i get again wrong numbers.

mothur > list.seqs(group=final.soils.groups)
Output File Name:
soil.accnos
mothur > get.seqs(accnos=soil.accnos, name=all_samples.names)
Selected 295870 sequences from your name file.
Output File Names:
all_sample.pick.names


Start End NBases Ambigs Polymer NumSeqs Minimum: 1 150 150 0 4 1 2.5%-tile: 1 198 198 0 4 9086 25%-tile: 1 354 354 0 4 90853 Median: 1 359 359 0 4 181705 75%-tile: 1 361 361 0 4 272557 97.5%-tile: 1 380 380 0 5 354323 Maximum: 1 398 398 0 6 363408 Mean: 1 345.827 345.827 0 4.25089 # of unique seqs: 251691 total # of seqs: 363408

cheers

Thanks for bringing this to our attention. It will be fixed in 1.23.0.

hi,
I have met the same problem, have you solved it now ? if yes ,could you tell me how to do? thank you! please contact me 348543076@qq.com.
Alice

What version are you using?