thanks for your helping!
Raw data of 2x150bp from 7 lanes of GAIIx was used to analyse (40Gb *.fastq.gzip), V3 region (341f-518r).
It’s may be caused by unique sequences. There were still 4 million unique sequences When commond remove.lineage done.
To follow up on Sarah’s comment, I think the problem has everything to do with using GAII and short reads that do not fully overlap. If you look at our Kozich et al. paper in AEM, we show that the sequences have to fully overlap to reduce the error rate. Like she indicated, doing an OTU-based analysis may be computationally out of reach.
Has any other way to decrease the unique sequences?
make.contigs done, generated 160,000,000 sequences. stability.trim.contigs.fasta
screen.seqs done, generated 130,000,000 sequences. stability.trim.contigs.good.summary
unique.seqs done, generated 17,382,705 sequences. stability.trim.contigs.good.unique.fasta
align.seqs and screen.seqs done, generated 13,945,444 sequences. stability.trim.contigs.good.unique.good.align
pre.cluster (diffs=2) done, generated 7,348,122 sequences. stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta
chimera.uchime, remove.seqs classify.seqs, and remove.lineage done, generated 3,902,627 sequences. stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
There’re still too large!!
And when I check stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.summary , I find large number of unique sequences have only 1 of “numSeqs”, can I delete them?