I applied the get.groups command to get a subset of my samples. It states that 51970 sequences have been picked from the names file and the group file. This is true for the group file indeed. I just opened it in a text editor and checked the line count. Unfortunately the summary. seqs command reveals that just 51312 sequences have been picked from the names file. This incompatibility is causing me troubles in the down stream analysis
I am running 64bit linux version 1.22.2. Tried 1.21.1 under windows as well. Same result. Any ideas?
mothur > get.groups(fasta=joint.unique.precluster.pick.filter.good.good.pick.fasta,group=joint.pick.good.good.pick.groups,name=joint.unique.precluster.pick.good.good.pick.names,groups=m007-s007-mby-sby-m010-s010-m014-s014-mple-sple-mpea-spea-mco-sco)
Selected 51970 sequences from your name file.
Selected 5989 sequences from your fasta file.
Selected 51970 sequences from your group file.
Output File names:
mothur > summary.seqs(fasta=current,name=current)
Using joint.unique.precluster.pick.filter.good.good.pick.pick.fasta as input file for the fasta parameter.
Using joint.unique.precluster.pick.good.good.pick.pick.names as input file for the name parameter.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 868 241 0 3 1
2.5%-tile: 1 868 280 0 4 1283
25%-tile: 1 868 280 0 5 12829
Median: 1 868 280 0 5 25657
75%-tile: 1 868 304 0 5 38485
97.5%-tile: 1 868 328 0 5 50030
Maximum: 1 869 346 0 6 51312
Mean: 1 868 289.366 0 4.97418
# of unique seqs: 5989
total # of seqs: 51312
Just to follow up from my previous email. I’m using mothur version 122.0.
As Fabio mentioned I’m also interested in getting groups from my big group file:
mothur > get.groups(group=all_sample.groups, fasta=all_sample.fasta, groups=SEC1-SEC2-SEC3-SEC4-SEC5-SEL1, name=all-sample.names)
Selected 377439 sequences from your name file.
Selected 251691 sequences from your fasta file.
Selected 377439 sequences from your group file.
so clearly, numbers don’t add up for the names file.
I also tried using list.seqs to get accnos file (a list file with names, soil.accnos) from my subsample groups file (final.soils.groups) and when i check the list file separately i get correct number of seqs. However then i run get. seqs using the accnos file i just created and big name file (all_sample.names) to retrieve subsample names i get again wrong numbers.
mothur > list.seqs(group=final.soils.groups)
Output File Name:
mothur > get.seqs(accnos=soil.accnos, name=all_samples.names)
Selected 295870 sequences from your name file.
Output File Names: