Classifying OTUs

Hello, I’m using the command classify.otu, and in the taxonomy output file I have this result:
OTU Size Taxonomy
1 1344 k__Bacteria(100);p__Bacteroidetes(100);unclassified(99);unclassified(99);unclassified(99);unclassified(99);unclassified(99);
2 228 k__Bacteria(100);p__Firmicutes(100);c__Clostridia(99);o__Clostridiales(99);unclassified(98);unclassified(98);unclassified(98);
3 483 k__Bacteria(100);p__Bacteroidetes(100);c__Bacteroidia(100);o__Bacteroidales(100);unclassified(100);unclassified(100);unclassified(100);
4 107 k__Bacteria(100);p__Bacteroidetes(99);unclassified(81);unclassified(81);unclassified(81);unclassified(81);unclassified(81);

Based on the size of the OTU, I want to select those OTUs with more than 100 sequences and construct a phylogenetic tree. However, I don’t know what are the sequences that correspond to OTUs 1, 2, 3, 4,…

I would like to know how I can figure out the representative sequence of each OTU that were used in the command classify.otu.

Thank you!

Marcelo

Hi Marcelo,

You could use split.abund (http://www.mothur.org/wiki/Split.abund) to separate your samples based on the 100 frequency and generate the needed accnos file. You could then use that file to run get.seqs (http://www.mothur.org/wiki/get.seqs) to pull out the abundant sequences.

Hope this helps,
Pat

Hello Pat,
I used the command split.abund to generate the accnos file:
split.abund(fasta=merged.final.fasta, list=merged.final.phylip.an.list, cutoff=50, accnos=true, label=0.05)
Then I used the get.seqs:
get.seqs(accnos=merged.final.phylip.an.0.05.abund.accnos, fasta=merged.final.0.05.abund.fasta, list=merged.final.phylip.an.0.05.abund.list, name=merged.final.names, group=merged.final.groups)

After that, I used the command make.shared:
make.shared(list=merged.final.phylip.an.0.05.abund.pick.list, group=merged.final.pick.groups, label=0.05)

and I found 75 OTUs with more than 50 sequences, and the shared file looks like this:

label 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
Group S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13
numOtus 75 75 75 75 75 75 75 75 75 75 75 75 75
Otu01 139 0 0 0 3 0 2 1 1 103 226 128 741
Otu02 38 0 0 0 1 0 0 1 1 111 25 46 5


I’m trying to get a fasta file with only the 75 sequences that correspond to Otu01, Otu02, Otu03, but I don’t know how to do it. When I used the get.seqs command it selected 2348 sequences from my fasta file and 14086 sequences from my group, list and name file. So, how can I generate a fasta file with only those 75 sequences? Thanks for your help.

Marcelo

Hello,
I figured out how to do it!
Thank you!
Marcelo

This is exactly what I nee nagem7, could you post what you ended up doing?