Screen.seqs error sequences in fasta file vs. summary files

rrivas · January 15, 2015, 6:19pm

When I run mothur > make.contigs(ffasta=GORGE_R1.paired.trim.fasta, rfasta=GORGE_R2.paired.trim.fasta)
I have the next Error:
Using 1 processors.
Reading fastq data…
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
142048
[WARNING]: did not find paired read for M02897_67_000000000-ABF9G_1_1101_15293_2 0360, ignoring.
[WARNING]: did not find paired read for M02897_67_000000000-ABF9G_1_1101_2907_80 36, ignoring.
[WARNING]: did not find paired read for M02897_67_000000000-ABF9G_1_1102_14610_1 1709, ignoring.
[WARNING]: did not find paired read for M02897_67_000000000-ABF9G_1_1102_14619_1 1732, ignoring.
[WARNING]: did not find paired read for M02897_67_000000000-ABF9G_1_1102_16773_4 946, ignoring.
[…]

Processing GORGE_R1.paired.trim.0ffastatemp (file 1 of 1) <
Making contigs…
1000
2000
3000
4000
5000
6000
[…]

And after, at the end of running screen.seqs (mothur > screen.seqs(fasta=GORGE_R1.paired.trim.trim.contigs.fasta, summary=GORGE_R1.paired.trim.trim.contigs.summary, maxlength=600, maxambig=0, processors=16) I have the next error:

[ERROR]: found 142048 sequences in your fasta file, and 108626 sequences in your summary file, quitting.

Where is the problem?

I cannot do the next step with the rest of my samples if I haven’t this sample

pschloss · January 16, 2015, 4:18pm

I suspect the problem is because of your prior trimming. I would tell you to give make.contigs the original fastq files and go from there. We have yet to find a trimming approach that works as well as make.contigs with the fastq files.

pat

Guilllg · May 1, 2016, 6:44pm

Hi,

Not sure if I should be posting my question on a different post as my problem is slightly different but it also relates to a discrepancy between fasta and summary files when running screen.seqs.

Here’s my problem:

I’m currently trying to follow the Mothur Miseq SOP with my dataset. Everything seems to be running fine until getting to the second “screen.seqs”, after the alignment step.

Running:

screen.seqs(fasta=current, count=current, summary=current, start=8, end=9582, maxhomop=8)

Using 45_88_R1.fastq.trim.contigs.trim.good.count_table as input file for the count parameter.
Using 45_88_R1.fastq.trim.contigs.trim.good.unique.align as input file for the fasta parameter.
Using 45_88_R1.fastq.trim.contigs.trim.good.unique.summary as input file for the summary parameter.

Gives me the following error:
…

[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align59588.num.temp is blank. Please correct.
[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align59589.num.temp is blank. Please correct.
[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align59590.num.temp is blank. Please correct.
[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align59591.num.temp is blank. Please correct.
[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align59592.num.temp is blank. Please correct.

[ERROR]: found 78079 sequences in your fasta file, and 1561569 sequences in your summary file, quitting.

The odd thing is that the fasta file does seem to contain 1561569 sequences:

mothur > summary.seqs(fasta=45_88_R1.fastq.trim.contigs.trim.good.unique.align, count=45_88_R1.fastq.trim.contigs.trim.good.count_table, processors=20)

Using 20 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 8 9582 251 0 3 100386
25%-tile: 8 9582 252 0 4 1003854
Median: 8 9582 252 0 4 2007708
75%-tile: 8 9582 252 0 5 3011561
97.5%-tile: 8 9582 253 0 6 3915029
Maximum: 9582 9582 274 0 21 4015414
Mean: 54.9649 9559.37 250.214 0 4.63684

of unique seqs: 1561569

total # of seqs: 4015414

Output File Names:
45_88_R1.fastq.trim.contigs.trim.good.unique.summary
…

Also, re-running the screen.seqs command results in slightly different errors; i.e. the “blank” sequences will be different, and the mismatch between fasta and summary files will also be different:

e.g.:
…

[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align88554.num.temp is blank. Please correct.
[ERROR]: 45_88_R1.fastq.trim.contigs.trim.good.unique.align88558.num.temp is blank. Please correct.
[ERROR]: found 1405412 sequences in your fasta file, and 1561569 sequences in your summary file, quitting.
…

Could anything has gone wrong during the alignment step?

As extra bits of information, I’m running mothur v.1.34.4; my primer pair is 515F/806R.

Also, the only different step from the mothur Miseq SOP I did was using trim.seqs instead of using a stability file during the make.contigs command. I did this because my data came back as pooled R1 and R2 fastq files (as opposed to one fastq set per sample). I followed the suggestions from this previous post to run the trim.seqs command:
http://w.mothur.org/forum/viewtopic.php?t=3311

Thanks, Guillaume

campenr · May 2, 2016, 11:54am

I’m not sure of how to fix your issue specifically, but upgrading to the latest version of mothur (1.37.2) may resolve it.

If not let us know.

Cheers
Richard

Guilllg · May 3, 2016, 10:48am

Hey Richard, thanks for the reply.

I just realized my issue was caused because I was running out of memory on our server…

Tidying-up a bunch a files solved the problem.

Sorry for the perhaps not-relevant post.
Guillaume

Topic		Replies	Views
Problem with screen.seqs (count and fasta file mismatch) Commands in mothur	5	2133	October 11, 2019
File mismatch after screen.seqs mothur bugs	4	792	September 8, 2019
Error in summary.seq after alignment mothur bugs	5	992	April 14, 2020
File mismatch detected in summary.seqs Commands in mothur	6	289	June 4, 2023
Problems with summary.seqs Commands in mothur	4	739	April 26, 2020

Screen.seqs error sequences in fasta file vs. summary files

of unique seqs: 1561569

Related topics