Classifying OTUs

nagem7 · March 19, 2012, 6:00pm

Hello, I’m using the command classify.otu, and in the taxonomy output file I have this result:
OTU Size Taxonomy
1 1344 k__Bacteria(100);p__Bacteroidetes(100);unclassified(99);unclassified(99);unclassified(99);unclassified(99);unclassified(99);
2 228 k__Bacteria(100);p__Firmicutes(100);c__Clostridia(99);o__Clostridiales(99);unclassified(98);unclassified(98);unclassified(98);
3 483 k__Bacteria(100);p__Bacteroidetes(100);c__Bacteroidia(100);o__Bacteroidales(100);unclassified(100);unclassified(100);unclassified(100);
4 107 k__Bacteria(100);p__Bacteroidetes(99);unclassified(81);unclassified(81);unclassified(81);unclassified(81);unclassified(81);

Based on the size of the OTU, I want to select those OTUs with more than 100 sequences and construct a phylogenetic tree. However, I don’t know what are the sequences that correspond to OTUs 1, 2, 3, 4,…

I would like to know how I can figure out the representative sequence of each OTU that were used in the command classify.otu.

Thank you!

Marcelo

pschloss · March 19, 2012, 6:09pm

Hi Marcelo,

You could use split.abund (http://www.mothur.org/wiki/Split.abund) to separate your samples based on the 100 frequency and generate the needed accnos file. You could then use that file to run get.seqs (http://www.mothur.org/wiki/get.seqs) to pull out the abundant sequences.

Hope this helps,
Pat

nagem7 · March 20, 2012, 4:03pm

Hello Pat,
I used the command split.abund to generate the accnos file:
split.abund(fasta=merged.final.fasta, list=merged.final.phylip.an.list, cutoff=50, accnos=true, label=0.05)
Then I used the get.seqs:
get.seqs(accnos=merged.final.phylip.an.0.05.abund.accnos, fasta=merged.final.0.05.abund.fasta, list=merged.final.phylip.an.0.05.abund.list, name=merged.final.names, group=merged.final.groups)

After that, I used the command make.shared:
make.shared(list=merged.final.phylip.an.0.05.abund.pick.list, group=merged.final.pick.groups, label=0.05)

and I found 75 OTUs with more than 50 sequences, and the shared file looks like this:

label 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
Group S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13
numOtus 75 75 75 75 75 75 75 75 75 75 75 75 75
Otu01 139 0 0 0 3 0 2 1 1 103 226 128 741
Otu02 38 0 0 0 1 0 0 1 1 111 25 46 5

Iâ€™m trying to get a fasta file with only the 75 sequences that correspond to Otu01, Otu02, Otu03, but I donâ€™t know how to do it. When I used the get.seqs command it selected 2348 sequences from my fasta file and 14086 sequences from my group, list and name file. So, how can I generate a fasta file with only those 75 sequences? Thanks for your help.

Marcelo

nagem7 · March 21, 2012, 4:03am

Hello,
I figured out how to do it!
Thank you!
Marcelo

jsage8 · December 14, 2013, 1:04pm

This is exactly what I nee nagem7, could you post what you ended up doing?

Topic		Replies	Views
Result after classifying.otu Commands in mothur	2	1109	August 29, 2016
classify.otu after subsampling Commands in mothur	1	1208	August 15, 2016
classify.otu and then organize by sample? Commands in mothur	1	3030	January 24, 2011
Classify OTUs by sample using all sequences Commands in mothur	5	5981	June 20, 2014
unclassified sequences? Commands in mothur	3	1612	March 21, 2017

Classifying OTUs

Related topics