Hi there
I am a newbie so I apologize in advance for any stupid questions I may ask…
So I am working with MiSeq sequences that are 16S V4 region from guts from fungus-growing ants, I know that in those ant guts there are 3-5 abundant OTUs (which are the 95-98% of the whole community) and maybe some traces of other ones
To process my sequences I follow the MiSeq SOP tutorial and I ran the following commands:
make.contigs(file=Untitled2, processors=8)
summary.seqs(fasta=Actinotrim.contigs.fasta)
screen.seqs(fasta=Actinotrim.contigs.fasta, group=Actinocontigs.groups, maxambig=0, maxlength=330)
unique.seqs(fasta=Actinotrim.contigs.good.fasta)
count.seqs(name=Actinotrim.contigs.good.names, group=Actinocontigs.good.groups)
summary.seqs(count=Actinotrim.contigs.good.count_table, fasta=Actinotrim.contigs.good.unique.fasta)
align.seqs(fasta=Actinotrim.contigs.good.unique.fasta, reference=silva.bacteria.v4.fasta, processors=8)
summary.seqs(fasta=Actinotrim.contigs.good.unique.align, count=Actinotrim.contigs.good.count_table)
"Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 11895 25318 270 0 3 1
2.5%-tile: 11895 25319 292 0 4 374
25%-tile: 11895 25319 293 0 4 3740
Median: 11895 25319 293 0 4 7479
75%-tile: 11895 25319 293 0 5 11218
97.5%-tile: 11895 25319 294 1 6 14583
Maximum: 11897 25319 351 5 9 14956
Mean: 11895 25319 292.977 0.0573014 4.57014
of Seqs: 14956"
screen.seqs(fasta=Actinotrim.contigs.good.unique.align, count=Actinotrim.contigs.good.count_table, summary=Actinotrim.contigs.good.unique.summary, start=11895, end=25319, maxhomop=8, processors=8)
summary.seqs(fasta=current, count=current)
"Using 8 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 11895 25319 268 0 3 1
2.5%-tile: 11895 25319 292 0 4 125220
25%-tile: 11895 25319 292 0 4 1252194
Median: 11895 25319 293 0 4 2504388
75%-tile: 11895 25319 293 0 5 3756581
97.5%-tile: 11895 25319 293 0 5 4883555
Maximum: 11895 25319 305 0 8 5008774
Mean: 11895 25319 292.644 0 4.43981
of unique seqs: 703114
total # of seqs: 5008774
Output File Names:
shit/Actinotrim.contigs.good.unique.good.summary"
filter.seqs(fasta=Actinotrim.contigs.good.unique.good.align, vertical=T, trump=.)
unique.seqs(fasta=Actinotrim.contigs.good.unique.good.filter.fasta, count=Actinotrim.contigs.good.good.count_table)
pre.cluster(fasta=Actinotrim.contigs.good.unique.good.filter.unique.fasta, count=Actinotrim.contigs.good.unique.good.filter.count_table, diffs=2)
summary.seqs(fasta=current, count=current)
chimera.uchime(fasta=Actinotrim.contigs.good.unique.good.filter.unique.precluster.fasta, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
remove.seqs(fasta=Actinotrim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos)
classify.seqs(fasta=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=silva.bacteria.ng.fasta, taxonomy=silva.bacteria.silva.tax, cutoff=80, processors=8)
remove.lineage(fasta=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.taxonomy, taxon=Chloroplast-Mitochondria-Archaea-Eukaryota)
after all of the above steps I end up with 131000 sequences and my objective is to do an OTU-based analysis
however even if I am able to run normally the dist.seqs command (it takes about 4hr to make an 80GB dist file) when I run the cluster command: cluster(column=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.ed.dist, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.ed.count_table)
the computer runs out of memory…
then I m trying to run cluster.split, first using the following command
cluster.split(column=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.ed.dist, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.ed.count_table, taxonomy=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.ed.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.05)
again it runs out of memory
then I tried to run in the way it is presented in the MiSeq SOP
cluster.split(fasta=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.ed.fasta, count=Actinotrim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.ed.count_table, taxonomy=Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.pick.ed.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.05)
again it starts normally but in the terminal it says that the method is NOT “classify” but “fasta”
after a couple of hours and while it is clustering the dist files I get again a memory error
"Clustering shit/Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.filter_ed.fasta.2.dist
|
Clustering shit/Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.filter_ed.fasta.1.dist
Clustering shit/Actinotrim.contigs.good.unique.good.filter.unique.precluster.pick.pick.filter_ed.fasta.0.dist
Cutoff was 0.055 changed cutoff to 0.02
[ERROR]: Could not open 31150.temp
[ERROR]: std::bad_alloc has occurred in the ClusterSplitCommand class function createProcesses. This error indicates your computer is running out of memory. This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are running our 32bit version, your memory usage is limited to 4G. If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are uable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry.% "
Is 131000 sequences too many? is this normal? Am I doing sth wrong?
any help would be very much appreciated
Cheers
Panos
PS: I have edited the headers of the fasta, the taxonomy and the count_table files (I shortened the headers) because if I hadnt done that then the files after the dist.seqs or the cluster.split would be more than 200-300gb and my pc was running out of space…