mothur

Make Contigs - name mismatch problem

Newbie here …

Mothur version 1.45.3
Windows 11, i9 + 64GB RAM
MiSeq Illumina fungal ITS data
No pre-processing of the supplied gz files.
Forward/reverse files contain equal numbers of entries.

Commands used …
make.file(inputdir=., type=gz, prefix=stability)
make.contigs(file=stability.files, processors=15)

I get lots of warnings and lose about half the reads.
e.g.
>>>>> Processing file pair D:\Docs\MothurData\eDNA\testmake\lepidium__SV_DOC_-LK-04_6__ITS3_KYO2_R1.fastq.gz - D:\Docs\MothurData\eDNA\testmake\lepidium__SV_DOC_-LK-04_6__ITS3_KYO2_R2.fastq.gz (files 1 of 1) <<<<<

gives …
[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, M07073_33_000000000-JKRRG_1_1107_9781_22902__lepidium__SV_DOC_-LK-04_6__ITS3_KYO2.

But here is the entry in the R1 file …
@M07073:33:000000000_JKRRG:1:1107:9781:22902__lepidium__SV_DOC__LK_04_6__ITS3_KYO2
ATGCGATACTTGGTGTGAATTGCAGAATCCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCCCCAAGCCTTCTGGCCGAGGGCACGTCTGCCTGGGCGTCACAAATCGTCGTCCCACTCACGAAATTTTGCGAGTGCGGGACGGAAGCTGGTCTCCCGTGTGTTACCGCACGCGGTTGGCCAAAATCTGAGCTGAGGATGCTGGGAGCGTCCCGACATGCGGTGGTGATCTAAAAGCCTCTTCATATTGCCGGTCGCTCCTGTCCGTAAGCTCTCG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCEGGGGGGGGFGGGGFGGGGGFGFFGFFF;>?FFFFF?F?FF:6?AFFF(

and here is the entry in the R2 file …

@M07073:33:000000000_JKRRG:1:1107:9781:22902__lepidium__SV_DOC__LK_04_6__ITS3_KYO2
TTAAACTCAGCGGGTGATCCCGCCTGACCTGGGGTCGCTTTGAGGACATTGGGTCAACGAGAGCTTACGGACAGGAGCGACCGGCAATATGAAGAGGCTTTTAGATCACCACCGCATGTCGGGACGCTCCCAGCATCCTCAGCTCAGATTTTGGCCAACCGCGTGCGGTAACACACGGGAGACCAGCTTCCGTCCCGCACTCGCAAAATTTCGTGAGTGGGACGACGATTTGTGACGCCCAGGCAGACGTGCCCTCGGCCAGAAGGCTTGGTGCGC
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGFGFGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGB<FGFFFFFFE?99;>F??>DDGGFFFFGGGF8FGGF?F>9>E3CFFAB75>>BF9<B:B<:610441<?B<(4:<3627446?:68.491,

The names are the same.

I tried processing the uncompressed files, running on 1 processor, changing all dashes in filenames/sequence labels to underscores. I get the same warnings and dropped contigs.

Could you send the D:\Docs\MothurData\eDNA\testmake\lepidium__SV_DOC_-LK-04_6__ITS3_KYO2_R1.fastq.gz and D:\Docs\MothurData\eDNA\testmake\lepidium__SV_DOC_-LK-04_6__ITS3_KYO2_R2.fastq.gz files to mothur.bugs@gmail.com so I can troubleshoot the issue for you?

Thanks for sending your files. The make.comtigs command expects the forward reads to be in the same order as the reverse reads in the fastq files. The “missing” reads are present in the files, but the order is swapped. For example:

R1 file order
@M07073:33:000000000-JKRRG:1:2106:27076:10535__lepidium__SV_DOC_-LK-04_6__ITS3_KYO2
TTAAACTCAGCGGGTGATCCCGCCTGACCTGGGGTCGCTTTGAGGACAT…
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG…
@M07073:33:000000000-JKRRG:1:2106:27054:10535__lepidium__SV_DOC_-LK-04_6__ITS3_KYO2
TTAAACTCAGCGGGTGATCCCGCCTGACCTGGGGTCGCTTTGAGG…
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG…

R2 file order
@M07073:33:000000000-JKRRG:1:2106:27054:10535__lepidium__SV_DOC_-LK-04_6__ITS3_KYO2
TTAAACTCAGCGGGTGATCCCGCCTGACCTGGGGTCGCTTTGAGG…
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG…
@M07073:33:000000000-JKRRG:1:2106:27076:10535__lepidium__SV_DOC_-LK-04_6__ITS3_KYO2
TTAAACTCAGCGGGTGATCCCGCCTGACCTGGGGTCGCTTTGAGGACAT…
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG…

The mismatch caused the command to skip 6 reads out of 31320 sequences. I recommend ignoring the warnings and proceeding with the analysis.

Hi Sarah,

I just wondered if you’d had a chance to test the files – whether the issue is with my platform, the data files, or the code.

Thanks,

Jerry

Hi Jerry,

Yes, I was able to find the issue. In my post above, I explained the issue is caused by the data files. The make.contigs command expects the forward reads to be in the same order as the reverse reads in the fastq files. The “missing” reads are present in the files, but the order is swapped. The mismatch caused the command to skip 6 reads out of 31320 sequences. I recommend ignoring the warnings and proceeding with the analysis.

Kindly,
Sarah