OTUs & unique.seqs

Hi all,

I am following Schloss’s MiSeq SOP to process my 16S V6 region Illumina data with some modification. However, I would like to understand how the unique.seqs work. Since the V6 region generate from illumina HiSeq is pretty short (90bp), the chances of getting the exact sequence is very high. If I apply unique.seqs to remove all the “replicate” reads, will this affect the OTU calculation and generation of rarefaction curve plots?


The unique.seqs command creates a names file that includes the duplicates. When you run the cluster command be sure to include the names file, cluster(column=yourDistanceMatrix, name=yourNameFIle). Including the names file ensures that the duplicates are included in the OTUs and the rest of the downstream analysis.

Thanks westcott for your input. Now I have a better understanding how the name file works. However I do have another question regarding the OTU calculation. I subject the unique sequences to clustering using the pre.cluster and cluster command.
In my datasets, I am getting around 1,600+ unique sequence out of original 25,000 sequences after the clustering process, but when i continue to get the rarefaction curve, the rarefaction curve plotted with more than 9000+ unique sequences/OTU based on unique cut off. To my understanding the sequences that can be clustered should be consider as one OTU, I am wondering why I am still getting a very steep rarefaction curve?