output from get.seqs

snaedis · December 5, 2013, 1:42pm

Hello

I am a mothur newbie and a bit confused about my results.
Before analysing my processed data I wanted to separate it into 22 samples and therefore split up my groups file into 22 new groups files and ran list.seqs with each groups file to get corresponding accnos files.

I than ran for each sample:
mothur > get.seqs(accnos=A1_V.accnos, fasta=AmpIfinal.fasta)
Selected 533 sequences from your fasta file.
Output File Names:
AmpIfinal.pick.fasta
The total number of sequences selected for each sample added up to the total number of unique sequences.

Then I ran:
mothur > get.seqs(accnos=A1_V.accnos, name=AmpIfinal.names, fasta=AmpIfinal.fasta, group=A1_V.groups, dups=F)
Selected 2289 sequences from your name file.
Selected 611 sequences from your fasta file.
Selected 2289 sequences from your group file.
The number of sequences selected from my names files and groups files add up to the total amount of sequences but the number of sequences selected from the fasta files are higher than the number of unique sequences.

Can anyone please explain this discrepancy.
Should I keep the 22 new names files and groups files and run get.seqs for the fasta files separately or will I run into problems later if get.seqs is not run simultaneously for the names, fasta and groups files?

Thank you in advance
Snaedis

westcott · December 6, 2013, 2:53pm

Welcome to the mothur community! What version of mothur are you using? The list.seqs and get.seqs should do what you are trying to do, but it may be easier to do with the split.groups command, http://www.mothur.org/wiki/Split.groups.

snaedis · December 6, 2013, 10:08pm

Yes, split.groups sounds easier. Thanks for the suggestion.

I am using v.1.32.1. However, running the same commands in v.1.24.0 resulted in extraction of the correct numbers of unique sequences.

westcott · December 9, 2013, 3:27pm

For get.seqs command, dups=t by default. This means that if a unique sequence is selected, then all the redundant sequences for that sequence are selected. From looking at mothur’s outputs I suspect you have some sequences in the fasta file that are listed in column 2 of the names file. Mothur assumes a “unique” fasta file contains only sequences from column 1 of the names file. How did you create these fasta and names files?

http://www.mothur.org/wiki/Name_file
http://www.mothur.org/wiki/Fasta_file

snaedis · December 10, 2013, 12:25pm

The fasta and names files are outputs of sequence processing according to the 454 SOP.

What bothers me is that
mothur > get.seqs(accnos=A1_V.accnos, fasta=AmpIfinal.fasta) results in 533 sequences while
mothur > get.seqs(accnos=A1_V.accnos, name=AmpIfinal.names, fasta=AmpIfinal.fasta, group=A1_V.groups, dups=F) results in 611.

Also, that the latter command results in 533 and 611 sequences in v.1.24. an 1.32, respectively.
But, I am a newbie - maybe I should just switch to v. 1.24 :?

westcott · December 10, 2013, 2:31pm

I would not recommend switching to version 1.24. Our latest version contains many new features, updates and bug fixes. If you send your files to mothur.bugs@gmail.com I can track down the exact cause of the discrepancy and help you resolve the issue.

snaedis · December 11, 2013, 12:29pm

IÂ´ve sent the files - thank you

westcott · December 11, 2013, 2:47pm

The accnos file contains 2289 sequence names. Some are unique and some are redundant because the accnos file was created from a groups file. It can get confusing this way, ideally you want the accnos file to contain the unique names, because mothur is smart enough to handle the names file. Let me give you an example of what’s happening with one of the 78 sequences that’s being selected with the names file option.

From the names file:
H53OP4K01ALX9M H53OP4K01ALX9M,H53OP4K01ALBPZ,H53OP4K01ANM0S,H53OP4K01AI82C,H53OP4K01AHG1T,H53OP4K01ARODS,H53OP4K01ALANE,H53OP4K03B0Y69,H53OP4K03BUA4S

H53OP4K03B0Y69 is in the accnos file. When you run get.seqs with just the fasta file, mothur does not select sequence H53OP4K01ALX9M because it does not know that H53OP4K01ALX9M represents H53OP4K03B0Y69. But when you run get.seqs with the names file mothur does select H53OP4K01ALX9M because it makes the connection between H53OP4K01ALX9M and H53OP4K03B0Y69.

snaedis · December 17, 2013, 2:23pm

Thank you very much. The numbers make sense now

Topic		Replies	Views
get.seqs() gets it wrong? Commands in mothur	8	5938	May 30, 2013
Get.seqs returning different numbers Commands in mothur	3	2678	March 1, 2013
get.groups mothur bugs	6	7234	May 21, 2012
Get total.seqs + splitting a fasta file into several ones? Commands in mothur	2	2724	April 16, 2013
split.groups gives more sequences than are in the original mothur bugs	3	3692	April 4, 2012

output from get.seqs

Related topics