Hi,westcott,
thanks for your helping!
Raw data of 2x150bp from 7 lanes of GAIIx was used to analyse (40Gb *.fastq.gzip), V3 region (341f-518r).
It’s may be caused by unique sequences. There were still 4 million unique sequences When commond remove.lineage done.
4 million uniques will create a distance matrix too large to cluster. Here are some cluster memory stats, http://www.mothur.org/wiki/Cluster_stats. I think you are limited to phylotype based analysis.
To follow up on Sarah’s comment, I think the problem has everything to do with using GAII and short reads that do not fully overlap. If you look at our Kozich et al. paper in AEM, we show that the sequences have to fully overlap to reduce the error rate. Like she indicated, doing an OTU-based analysis may be computationally out of reach.
westcott, thanks.
Has any other way to decrease the unique sequences?
-------------when
make.contigs done, generated 160,000,000 sequences. stability.trim.contigs.fasta
screen.seqs done, generated 130,000,000 sequences. stability.trim.contigs.good.summary
unique.seqs done, generated 17,382,705 sequences. stability.trim.contigs.good.unique.fasta
align.seqs and screen.seqs done, generated 13,945,444 sequences. stability.trim.contigs.good.unique.good.align
pre.cluster (diffs=2) done, generated 7,348,122 sequences. stability.trim.contigs.good.unique.good.filter.unique.precluster.fasta
chimera.uchime, remove.seqs classify.seqs, and remove.lineage done, generated 3,902,627 sequences. stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
There’re still too large!!
And when I check stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.summary , I find large number of unique sequences have only 1 of “numSeqs”, can I delete them?
Thanks pschloss,
I have read your paper (AEM, 2013, Miseq data) before, nice paper and I have used your pipline in my lab.
I used 2x150bp for V3 region and at least 80-120bp overlap.