MiSeq data has way too many reads


I have some MiSeq data, paired-end, and am following the MiSeq SOP.

I clearly have too many sequences - over 2million unique sequences after contig joining and basic QC (screen.seqs), and over 600k after chimera removal.

So I have been looking at trim.seqs to try and remove low quality sequences. Which brings me to my question…

Should I run trim.seqs before or after make.contigs? Can I even run it on fastq directly?

What other steps should I be taking, that are not in the SOP, that will allow me to reduce the size of the dataset?


16s v4? 2x250 run? you can up your pre.cluster(diffs=3) and cluster.split(taxlevel=4 or 5)

I haven’t seen anything to suggest that trimming before make.contigs would improve things. I’d check on what happened with the sequencing run (sequence a mock?) and/or do diffs=3 in precluster.