I have a question regarding high observed OTUs in the processing of my sequences.
I have worked with the SOP region V4 (2*300pb) methodology of the 16s gene for 12 samples, however I have values of OTUs observed between 2000-3000, which makes my estimates of alpha diversity high as well (Chao= 7000-8000, Shannon=4.5-5.5)
Is it possible that this is normal?
P.S. I have also noticed that in my taxonomy file I have several OTUs that represent the same gender
Any help or comment in this regard would be highly appreciated.
What kind of samples did you sequence? How many reads per sample?
I strongly recommend not using the 2x300 chemistry because the quality crashes out after 500 total nucleotides (see Why do I have such a large distance matrix). Furthermore, since the V4 region is shorter than 300 nt, the sequencing error is exacerbated when one sequences beyond the length of the region being sequenced. High sequencing errors will artificially increase the number of OTUs and diversity of your samples. As that blog post points out, my recommendation is to resequence the samples with 2x250 reads. If that’s not possible, then you should probably be sure that you have trimmed the barcodes and primers from both ends and then use the phylotype approach to characterize the community at the genus or another level.
Hi, Oh gosh. Wish to see this soon before I sent it to the sequence. I also used 2x300 chemistry now I see I also encountered the problem with this.
I used: pcr.seqs to remove primers. I checked to remove the reserve primers there was no problem (80% of the sequences remain) but the real problem was the forward primer now (only 10% of the sequences remain). So stuck here. Do you have any advice for me?
You could create a custom reference alignment that does not include the primer sequences and then align your sequences to that reference. Then your output alignment won’t have the primers because they’ll get trimmed off because they don’t align to anything. Here’s how to create your own reference alignment…