Hello everyone,
I am following mothur’s SOP pipeline for MiSeq (2x250bp) (used mothur’s latest version 1.48) but running the following command my sequences are reduced from 4 million to 1 million. I have many doubts to choose the maximum length since the amplicon obtained is in the V3-V4 region of the 16S that has a size of approximately 461bp, I have considered approximately maxlength = 480bp and here I have that very high reduction.
The problem is that you are using a very long region with very little overlap between the paired reads. More than half of your reads have at least one ambiguous base call in them leading to them being removed. You should read back through the Kozich MiSeq paper to see the problem alternatively. Regardless, the resulting data will have a high error rate inflating the number of spurious OTUs. I’d encourage you to check out: