Clustering on tagmented fragments

I’m trying to re-analyse some open access data. The authors have used a V1-V4 (I know, this isn’t the best approach at all!) primers and the DNA was tagmented before sequencing. This means that the length of the sequences and start and end positions are highly variable upon alignment. I wanted to use cluster with the vsearch algorithm on the fasta and count files, but was wondering if that might artificially inflate the number of OTUs on tagmented DNA?

If so, are there any good OTU-based alternative approaches to this analysis I could use, or will only something phylotype-based be suitable?


That sounds like a mess. The methods in mothur really work best when the reads start and end at the same coordinates. Perhaps they were trying to do something like EMIRGE (EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data | Genome Biology | Full Text)? I’d check out the tools that have been built around that approach and see if they help.