Name mismatch in multi-processor make.contigs

Hi,

I’m running the latest version of mothur and trying to process some simulated FASTQ data with sequentially-named reads (header lines @A1, @A2, @A3, etc.). It appears make.contigs does not work correctly on certain FASTQ files with processors=2. This seems to happen when the sequence names differ by only 1 character.

The following pair of files gives an error:
not_working_R1.fastq

@A1
CCAGC
+
ABBCB
@A2
CCAGC
+
CCCCC
@A3
CCAGC
+
CCCCB
@A4
CCAGC
+
ABBBC
@A5
CCAGC
+
CCCCC

not_working_R2.fastq

@A1
ACTTT
+
>1>11
@A2
ACTTT
+
AABBB
@A3
ACTTT
+
AA1A>
@A4
ACTTT
+
BBBBB
@A5
ACTTT
+
BBBBA
$ mothur '#make.contigs(ffastq=not_working_R1.fastq, rfastq=not_working_R2.fastq, processors=2)'
Using 2 processors.
Making contigs...
3
[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A4.
1
Done.
It took 0 secs to process 4 sequences.

However, changing the sequence names in the two files to “A1”, “B2”, “C3”, “D4”, “E5” allows the file to be properly processed. I can provide working and non-working example files if needed.

Thanks for the help!

Thank you for reporting this bug. We added a feature to the make.contigs command in the last release to help skip missing reads in files to avoid name mismatches. Part of the name matching checks for “off by one character” for reads like: @M00178:4:000000000-A1AE6:1:1101:16364:1386 1:N:0:0 and @M00178:4:000000000-A1AE6:1:1101:16364:1386 2:N:0:0. This change is causing name mismatches with sequence names such as yours. We will correct this in our next release.

I had exactly the same problem with the version 1.44.3.

I checked the read files, and the “1:N:0:0” “2:N:0:0” seem to be the problem, as you suggested.

Could you please look into this?

Thanks!

Could you send your fastq files to mothur.bugs@gmail.com so I can track down the issue for you?

Similar issue. I get the message:
[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, M07073_33_000000000-JKRRG_1_2106_17065_10533__lepidium__RUA-B-04_116__ITS3_KYO2

But the entry in the forward file is …
@M07073:33:000000000-JKRRG:1:2106:17065:10533__lepidium__RUA-B-04_116__ITS3_KYO2

and in the reverse file is …
@M07073:33:000000000-JKRRG:1:2106:17065:10533__lepidium__RUA-B-04_116__ITS3_KYO2

They are identical. So why the warning, and lost match?

1 Like

Hi Jerry - thanks for the post. Could you create a new thread and let us know things like what version of mothur you’re running, where you go the data, etc?

Pat

…also facing this issue with mothur v.1.46.1
make.contigs with fq.gz files

Hi Nadine,

Can you post an example of the error message?

Pat

Hi Pat, wow - thanks for your fast answer.

Sure, below there is the command and the error message.

Windows version

Using Boost

mothur v.1.46.1

Last updated: 9/1/21

By Patrick D. Schloss

mothur >

make.contigs(file=stability.files)

Using 8 processors.

Processing file pair Forest1_1.fq.gz - Forest1_2.fq.gz (files 1 of 1) <<<<<

Making contigs…

[WARNING]: reading @A00808:703:H77MMDRXY:1:2109:1407:14418 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read A00808:703:H77MMDRXY:1:2109:20238:14403 for fasta and @A00808:703:H77MMDRXY:1:2109:1407:14418 for quality, ignoring.[WARNING]: Lengths do not match for sequence A00808:703:H77MMDRXY:1:2109:20238:14403. Read 1 characters for fasta and 224 characters for quality scores, ignoring read.[WARNING]: reading + expected a name with @ as a leading character, ignoring read.

[WARNING]: reading @A00808:703:H77MMDRXY:1:2109:7934:14418 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read + for fasta and @A00808:703:H77MMDRXY:1:2109:7934:14418 for quality, ignoring.[WARNING]: reading + expected a name with @ as a leading character, ignoring read.

[WARNING]: reading @A00808:703:H77MMDRXY:1:2109:17463:14450 expected a name with + as a leading character, ignoring.[WARNING]: names do not match. read + for fasta and @A00808:703:H77MMDRXY:1:2109:17463:14450 for quality, ignoring.[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, +.

**** Exceeded maximum allowed command warnings, silencing warnings ****

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2251_8250_32158.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2251_26313_32189.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2230_24777_24612.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2230_18982_24627.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2230_25852_24784.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2230_5990_24987.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2231_22932_6731.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2231_4173_6778.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2231_3233_6872.

[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, A00808_703_H77MMDRXY_1_2231_23701_7153.

Name mismatch warning is endless……

I tried to unzip and zip files from anew to check if zipping was an issue (because it was a problem with a previous data set but worked after gzipping them again).

In this case, I fear it might be an issue with the unequal reads in the forward and reverse reads and I try to solve this problem with the commands: list.seqs and get.seqs:

list.seqs(fastq=myReverseFastqFile) – because it has less reads than fw read

get.seqs(fastq=myForwardFastqFile), accnos=current)

list.seqs(fastq=myForwardFastq.pick.fastq)

get.seqs(fastq=myReverseFastqFile, accnos=current).

and then make.contigs with *pick.fastq files

The windows version of mothur does include the boost libraries so you should be able use *.gz files with the make.contigs command. Also, we modified the list.seqs command to help resolve file mismatch issues like this. You can now provide multiple files of the same type and mothur will output a list of sequences present in both files.

mothur > list.seqs(fastq=myForwardFastqFile-myReverseFastqFile) - list sequences present in both fastq files. Note the list.seqs command does not read *.gz files, so you will have to decompress the fastq files first.

mothur > get.seqs(fastq=myForwardFastqFile, accnos=current) - select reads from the forward file

mothur > get.seqs(fastq=myReverseFastqFile, accnos=current) - select reads from the reverse file