I’ve followed the SOP and used the sub.sample command for my 6 samples (size=4293). When I look at the classify.otu output (final.tx.1.cons.taxonomy) where sequences are assigned to phylotypes I get 274 taxa, but the total of sequences comes to 29835, which is the number of all sequences before subsampling. I should get 25758 sequences in total.
I don’t understand where I’m going wrong.
That’s because the sub-sampling occurs on the shared file and not the list/names file. The classifications of your OTUs is correct, it’s just the number of sequences that is off. The OTU numbering (the first column) of the cons.taxonomy is correct as are the column headings in the subsample.shared file.
Thanks, but is there any way of ‘fixing’ this? (I’ve tried subsampling list file, but that didn’t add up either)
Because my samples were used for culture, clonal/Sanger sequencing analysis using 5 different primer pairs and pyrosequencing I would like to be able to compare not just OTU measures such as sobs, coverage chao1 etc, but also comparisons with regards to taxonomic assignments at species level and for this I need to know how many of the subsampled sequences are assigned to each taxon.
I guess I don’t see why it should matter. The classification of the OTU shouldn’t change. You could run sub.sample(fasta=, name=, group=, taxonomy=) and then run dist.seqs, cluster, and classify.otu.