make.contigs with some gzipped fastq spams log file

Hi,

I’m trying to use the make.contigs command together with gzipped fastq files to save on disk space.
For some pairs of fastq.gz files, which come straight from a MiSeq, I’m getting millions of warnings like

[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.

from mothur which then slowly tries to use up all avaiable memory. Other gzipped files from the same MiSeq run process normal, without warnings. When I first gunzip those problematic files all is fine and when recompress them with gzip, on Linux, all goes well too. I used mothur version 1.37.4 on Linux for this.

Let me know if I you need more information.
–Robert

on more hint:

The badly behaving fastq.gz files are all very small, with 61 or fewer reads per sample, but all files (which I’ve tried, from the same MiSeq run) with 202 or more reads work fine.

–Robert

This seems to be related to this bug, make.contigs error with .gz file It will be fixed in our next release.

Can you try this version, https://github.com/mothur/mothur/releases/tag/v1.37.6?

Hi again,

https://github.com/mothur/mothur/releases/tag/v1.38.1.1 still has the bug.
If you like I could send you a pair of fastq.gz files that trigger the bug.

I’ve copied the relevant part of the log file below. The second line with warnings, having four [WARNING] pieces then gets repeated perpetually.

mothur > make.contigs(file=fileList.paired.file)

Using 1 processors.

Processing file pair /tmp/mothur_test/with_file_original/31M11_S5_L001_R1_001.fastq.gz - /tmp/mothur_test/with_file_original/31M11_S5_L001_R2_001.fastq.gz (files
1 of 1) <<<<<
Making contigs…
[WARNING]: Blank fasta name, ignoring read.
[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.

Could you send your files to mothur.bugs@gmail.com so I can troubleshoot the issue for you?

Thanks for sending your files. I can reproduce the error, but when I decompress and recompress it the error is gone. Can you see if this works on your end?

mothur > make.contigs(inputdir=…/…/make.contigs, file=file.txt)
Setting input directory to: /Users/sarahwestcott/Desktop/make.contigs/

Using 1 processors.

Processing file pair /Users/sarahwestcott/Desktop/make.contigs/31M11_S5_L001_R1_001.fastq.gz - /Users/sarahwestcott/Desktop/make.contigs/31M11_S5_L001_R2_001.fastq.gz (files 1 of 1) <<<<<
Making contigs…
[WARNING]: Blank fasta name, ignoring read.
[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.
[WARNING]: missing sequence for , ignoring.[WARNING]: expected a name with + as a leading character, ignoring.[WARNING]: missing quality for , ignoring.[WARNING]: Blank fasta name, ignoring read.

sarahwestcott$ gunzip 31M11_S5_L001_R1_001.fastq.gz
sarahwestcott$ gunzip 31M11_S5_L001_R2_001.fastq.gz
sarahwestcott$ gzip 31M11_S5_L001_R1_001.fastq
sarahwestcott$ gzip 31M11_S5_L001_R2_001.fastq


mothur > make.contigs(inputdir=../../make.contigs, file=file.txt) Setting input directory to: /Users/sarahwestcott/Desktop/make.contigs/

Using 1 processors.

Processing file pair /Users/sarahwestcott/Desktop/make.contigs/31M11_S5_L001_R1_001.fastq.gz - /Users/sarahwestcott/Desktop/make.contigs/31M11_S5_L001_R2_001.fastq.gz (files 1 of 1) <<<<<
Making contigs…
100
Done.

It took 0 secs to assemble 100 reads.

It took 0 secs to process 100 sequences.


Output File Names: /Users/sarahwestcott/Desktop/make.contigs/file.trim.contigs.fasta /Users/sarahwestcott/Desktop/make.contigs/file.trim.contigs.qual /Users/sarahwestcott/Desktop/make.contigs/file.contigs.report /Users/sarahwestcott/Desktop/make.contigs/file.scrap.contigs.fasta /Users/sarahwestcott/Desktop/make.contigs/file.scrap.contigs.qual

[WARNING]: your sequence names contained ‘:’. I changed them to ‘_’ to avoid problems in your downstream analysis.

mothur > quit()

Yes, re-zipping the files with gzip makes it work on my end as well.