mothur v.1.44.1 running of Windows 10 64 bit
I am getting errors right from the very start of my pipeline. The Illumina files I received from the service provider are the typical run-of-the-mill fasta and qual files, and mapping files.
I open mothur, and run the following commands:
set.current(processors=1, fasta=040920SWnodC_full.fasta, qfile=040920SWnodC_full.qual, oligos=040920SWnodC.oligos)
trim.seqs(fasta=current, qfile=current, oligos=current, qaverage=25, pdiffs=2, bdiffs=2)
And then just errors all the way… specifically stuff like this (many of them):
“[ERROR]: In sequence M02542_32_000000000-J47WM_1_1101_15286_1199’s quality scores, expected a number and got >M02542:32:000000000-J47WM:1:1101:18259:1200, setting score to 0.”
and also (again, many of them)
“sequence name mismatch btwn fasta: M02542_32_000000000-J47WM_1_1101_18259_1200 and qual file: M02542_32_000000000-J47WM_1_1101_17414_1199”
On a previous version, mothur just kept on generating errors until I manually killed it. In the latest version it stops, and give the message:
"[ERROR]: std::bad_allocRAM used: 4.7353Gigabytes . Total Ram: 5.87916Gigabytes.
has occurred in the QualityScores class function trimQScores. This error indicates your computer is running out of memory. This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are unable to resolve the issue, please contact Pat Schloss at firstname.lastname@example.org, and be sure to include the mothur.logFile with your inquiry."
So: is this an issue with the raw files themselves, or some other issue? The files sizes are 1.5 GB (fasta) and 4 GB (qual). I am using a very new laptop. I have 8GB RAM. I guess it could be a performance issue, but I don’t get it since I have run similar (and larger) datasets on a laptop that was 8 years old! And with only 4GB RAM (which eventually crashed last year)
Thanks in advance.