Hello, I’m using the command classify.otu, and in the taxonomy output file I have this result:
OTU Size Taxonomy
1 1344 k__Bacteria(100);p__Bacteroidetes(100);unclassified(99);unclassified(99);unclassified(99);unclassified(99);unclassified(99);
2 228 k__Bacteria(100);p__Firmicutes(100);c__Clostridia(99);o__Clostridiales(99);unclassified(98);unclassified(98);unclassified(98);
3 483 k__Bacteria(100);p__Bacteroidetes(100);c__Bacteroidia(100);o__Bacteroidales(100);unclassified(100);unclassified(100);unclassified(100);
4 107 k__Bacteria(100);p__Bacteroidetes(99);unclassified(81);unclassified(81);unclassified(81);unclassified(81);unclassified(81);
Based on the size of the OTU, I want to select those OTUs with more than 100 sequences and construct a phylogenetic tree. However, I don’t know what are the sequences that correspond to OTUs 1, 2, 3, 4,…
I would like to know how I can figure out the representative sequence of each OTU that were used in the command classify.otu.
Hello Pat,
I used the command split.abund to generate the accnos file:
split.abund(fasta=merged.final.fasta, list=merged.final.phylip.an.list, cutoff=50, accnos=true, label=0.05)
Then I used the get.seqs:
get.seqs(accnos=merged.final.phylip.an.0.05.abund.accnos, fasta=merged.final.0.05.abund.fasta, list=merged.final.phylip.an.0.05.abund.list, name=merged.final.names, group=merged.final.groups)
After that, I used the command make.shared:
make.shared(list=merged.final.phylip.an.0.05.abund.pick.list, group=merged.final.pick.groups, label=0.05)
and I found 75 OTUs with more than 50 sequences, and the shared file looks like this:
I’m trying to get a fasta file with only the 75 sequences that correspond to Otu01, Otu02, Otu03, but I don’t know how to do it. When I used the get.seqs command it selected 2348 sequences from my fasta file and 14086 sequences from my group, list and name file.
So, how can I generate a fasta file with only those 75 sequences?
Thanks for your help.