get.groups

Fabian · December 1, 2011, 6:37am

Hi,

I applied the get.groups command to get a subset of my samples. It states that 51970 sequences have been picked from the names file and the group file. This is true for the group file indeed. I just opened it in a text editor and checked the line count. Unfortunately the summary. seqs command reveals that just 51312 sequences have been picked from the names file. This incompatibility is causing me troubles in the down stream analysis
I am running 64bit linux version 1.22.2. Tried 1.21.1 under windows as well. Same result. Any ideas?

Cheers,

mothur > get.groups(fasta=joint.unique.precluster.pick.filter.good.good.pick.fasta,group=joint.pick.good.good.pick.groups,name=joint.unique.precluster.pick.good.good.pick.names,groups=m007-s007-mby-sby-m010-s010-m014-s014-mple-sple-mpea-spea-mco-sco)

Selected 51970 sequences from your name file.
Selected 5989 sequences from your fasta file.

Selected 51970 sequences from your group file.

Output File names: 
joint.unique.precluster.pick.good.good.pick.pick.names
joint.unique.precluster.pick.filter.good.good.pick.pick.fasta
joint.pick.good.good.pick.pick.groups

mothur > summary.seqs(fasta=current,name=current)

Using joint.unique.precluster.pick.filter.good.good.pick.pick.fasta as input file for the fasta parameter.
Using joint.unique.precluster.pick.good.good.pick.pick.names as input file for the name parameter.

  Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 868 241 0 3 1
2.5%-tile: 1 868 280 0 4 1283
25%-tile: 1 868 280 0 5 12829
Median:  1 868 280 0 5 25657
75%-tile: 1 868 304 0 5 38485
97.5%-tile: 1 868 328 0 5 50030
Maximum: 1 869 346 0 6 51312
Mean: 1 868 289.366 0 4.97418
# of unique seqs: 5989
total # of seqs: 51312

westcott · December 2, 2011, 11:59am

Could you send your fasta, name and group files to mothur.bugs@gmail.com?

Acacia21 · December 19, 2011, 5:06am

Hi,
i had exactly the same problem, have you got any ideas what can be done? Or alternatives?
much appreciated,
cheers

Acacia21 · December 20, 2011, 2:41am

Just to follow up from my previous email. I’m using mothur version 122.0.

As Fabio mentioned I’m also interested in getting groups from my big group file:
mothur > get.groups(group=all_sample.groups, fasta=all_sample.fasta, groups=SEC1-SEC2-SEC3-SEC4-SEC5-SEL1, name=all-sample.names)

Selected 377439 sequences from your name file.
Selected 251691 sequences from your fasta file.
Selected 377439 sequences from your group file.

Output files:
final.soils.fasta
final.soils.groups
final.soils.names

mothur > summary.seqs(fasta=final.soils.fasta, name=final.soils.names)

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 150 150 0 4 1
2.5%-tile: 1 198 198 0 4 9059
25%-tile: 1 354 354 0 4 90582
Median: 1 359 359 0 4 181163
75%-tile: 1 361 361 0 4 271744
97.5%-tile: 1 380 380 0 5 353266
Maximum: 1 398 398 0 6 362324
Mean: 1 345.802 345.802 0 4.25125

of unique seqs: 251691

total # of seqs: 362324

so clearly, numbers don’t add up for the names file.

I also tried using list.seqs to get accnos file (a list file with names, soil.accnos) from my subsample groups file (final.soils.groups) and when i check the list file separately i get correct number of seqs. However then i run get. seqs using the accnos file i just created and big name file (all_sample.names) to retrieve subsample names i get again wrong numbers.

mothur > list.seqs(group=final.soils.groups)
Output File Name:
soil.accnos
mothur > get.seqs(accnos=soil.accnos, name=all_samples.names)
Selected 295870 sequences from your name file.
Output File Names:
all_sample.pick.names

Start End NBases Ambigs Polymer NumSeqs Minimum: 1 150 150 0 4 1 2.5%-tile: 1 198 198 0 4 9086 25%-tile: 1 354 354 0 4 90853 Median: 1 359 359 0 4 181705 75%-tile: 1 361 361 0 4 272557 97.5%-tile: 1 380 380 0 5 354323 Maximum: 1 398 398 0 6 363408 Mean: 1 345.827 345.827 0 4.25089 # of unique seqs: 251691 total # of seqs: 363408

cheers

westcott · December 23, 2011, 4:15pm

Thanks for bringing this to our attention. It will be fixed in 1.23.0.

Alice · May 19, 2012, 1:50pm

Fabian:

Hi,

I applied the get.groups command to get a subset of my samples. It states that 51970 sequences have been picked from the names file and the group file. This is true for the group file indeed. I just opened it in a text editor and checked the line count. Unfortunately the summary. seqs command reveals that just 51312 sequences have been picked from the names file. This incompatibility is causing me troubles in the down stream analysis
I am running 64bit linux version 1.22.2. Tried 1.21.1 under windows as well. Same result. Any ideas?

Cheers,
mothur > get.groups(fasta=joint.unique.precluster.pick.filter.good.good.pick.fasta,group=joint.pick.good.good.pick.groups,name=joint.unique.precluster.pick.good.good.pick.names,groups=m007-s007-mby-sby-m010-s010-m014-s014-mple-sple-mpea-spea-mco-sco)

Selected 51970 sequences from your name file.
Selected 5989 sequences from your fasta file.

Selected 51970 sequences from your group file.

Output File names: 
joint.unique.precluster.pick.good.good.pick.pick.names
joint.unique.precluster.pick.filter.good.good.pick.pick.fasta
joint.pick.good.good.pick.pick.groups

mothur > summary.seqs(fasta=current,name=current)

Using joint.unique.precluster.pick.filter.good.good.pick.pick.fasta as input file for the fasta parameter.
Using joint.unique.precluster.pick.good.good.pick.pick.names as input file for the name parameter.

  Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 868 241 0 3 1
2.5%-tile: 1 868 280 0 4 1283
25%-tile: 1 868 280 0 5 12829
Median:  1 868 280 0 5 25657
75%-tile: 1 868 304 0 5 38485
97.5%-tile: 1 868 328 0 5 50030
Maximum: 1 869 346 0 6 51312
Mean: 1 868 289.366 0 4.97418
# of unique seqs: 5989
total # of seqs: 51312

hiï¼Œ
I have met the same problem, have you solved it now ? if yes ,could you tell me how to do? thank you! please contact me 348543076@qq.com.
Alice

pschloss · May 21, 2012, 9:52pm

What version are you using?

Topic		Replies	Views
output from get.seqs Commands in mothur	8	4913	December 17, 2013
Get.seqs returning different numbers Commands in mothur	3	2633	March 1, 2013
Get.groups in MiSeq SOP Commands in mothur	9	1728	September 11, 2017
Problems about the get.groups command Commands in mothur	3	2778	September 5, 2013
groupfile has more valid sequences in it than my namefile mothur bugs	7	11327	October 24, 2012

get.groups

of unique seqs: 251691

Related Topics