count.seqs [ERROR]: processes reported processing 8265234 sequences, but group file indicates you have 7993602 seq

Hi,
Can you please help me solve this error I get when I run the count.seqs following the screen.seqs and unique.seqs.
[ERROR]: processes reported processing 8265234 sequences, but group file indicates you have 7993602 sequences.

Dataset: 16S MiSeq data from V4 region.
I followed the Mothur workflow from MiSeqSOP.

make.contigs(ffastq=Undetermined_S0_L001_R1_001.fastq, rfastq=Undetermined_S0_L001_R2_001.fastq, findex=Undetermined_S0_L001_I1_001.fastq, oligos=reverse_oligos.csv, checkorient=t, trimoverlap=T, processors=16, pdiffs=2, bdiffs=2)
ā€¦
Group count:
-401428816
A1 175519
A10 107627
A11 127124
A12 127727
A2 181646
A3 164725
A4 219817
A5 420866
A6 324406
A7 130909
A8 248926
A9 98600
B1 307046
B10 97105
B11 82127
B12 143565
B2 82031
B3 109294
B4 88667
B5 148898
B6 62795
B7 73033
B8 124
B9 92425
C1 120558
C10 78560
C11 137214
C12 131368
C2 147680
C3 173761
C4 127717
C5 157751
C6 117890
C7 59846
C8 62504
C9 94715
D1 105276
D10 78472
D11 81174
D12 195551
D2 131472
D3 61541
D4 61102
D5 121606
D6 42034
D7 82836
D8 94387
D9 62986
E1 149414
E10 170020
E11 150931
E12 198009
E2 150389
E3 196348
E4 140098
E5 133753
E6 203408
E7 149187
E8 142332
E9 96236
F1 164377
F10 184152
F11 169850
F12 140745
F2 131543
F3 145958
F4 151822
F5 161046
F6 129488
F7 244114
F8 156485
F9 254129
G1 176499
G10 89130
G11 532624
G12 198984
G2 176680
G3 181339
G4 157970
G5 137239
G6 201167
G7 243259
G8 69070
G9 161998
H1 107118
H10 364585
H11 262535
H12 165928
H2 220717
H3 133717
H4 181562
H5 162953
H6 249360
H7 264736
H8 154073
H9 230840
Total of all groups is -386547896

Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.fasta
Undetermined_S0_L001_R1_001.scrap.contigs.fasta
Undetermined_S0_L001_R1_001.contigs.report
Undetermined_S0_L001_R1_001.contigs.groups

My trim file is 5.5GB while the scrap file is about 500MB.

mothur > summary.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.fasta, processors=16)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1 1 0 1 1
2.5%-tile: 1 171 171 0 3 384016
25%-tile: 1 248 248 0 4 3840152
Median: 1 248 248 0 4 7680303
75%-tile: 1 248 248 1 5 11520454
97.5%-tile: 1 249 249 6 6 14976590
Maximum: 1 257 257 112 31 15360605
Mean: 1 244.651 244.651 0.995176 4.46299

of Seqs: 15360605

Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.summary

mothur > screen.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.fasta, group=Undetermined_S0_L001_R1_001.contigs.groups, summary=Undetermined_S0_L001_R1_001.trim.contigs.summary, maxambig=0, maxlength=257)

Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10006_15423 please correct.
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10006_23031 please correct.
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10007_20812 please correct.
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10007_4112 please correct.
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10008_17156 please correct.
(This is the case for about 20k names. This is out of the 15.36M sequences.).
Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.good.summary
Undetermined_S0_L001_R1_001.trim.contigs.good.fasta
Undetermined_S0_L001_R1_001.trim.contigs.bad.accnos
Undetermined_S0_L001_R1_001.contigs.good.groups

It took 171 secs to screen 15360605 sequences.

mothur > summary.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.fasta, processors=16)
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 1 1 0 1 1
2.5%-tile: 1 143 143 0 3 206631
25%-tile: 1 248 248 0 4 2066309
Median: 1 248 248 0 4 4132618
75%-tile: 1 248 248 0 5 6198926
97.5%-tile: 1 249 249 0 6 8058604
Maximum: 1 254 254 0 11 8265234
Mean: 1 243.548 243.548 0 4.4696

of Seqs: 8265234

Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.good.summary

It took 18 secs to summarize 8265234 sequences.

mothur > unique.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.fasta)

Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.good.names
Undetermined_S0_L001_R1_001.trim.contigs.good.unique.fasta

mothur > count.seqs(name=Undetermined_S0_L001_R1_001.trim.contigs.good.names, group=Undetermined_S0_L001_R1_001.contigs.good.groups)

[ERROR]: processes reported processing 8265234 sequences, but group file indicates you have 7993602 sequences. Either you have a file mismatch or a process failed to complete the task assigned to it.


If I just tried count.seqs wiht just the name file: mothur > count.seqs(name=Undetermined_S0_L001_R1_001.trim.contigs.good.names, processors=16)

Using 16 processors.
It took 2 secs to create a table for 8265234 sequences.

Total number of sequences: 8265234

I then tried with just one processor as indicated in some of the forum pages Mismatch between fasta and group files and count.seqs mismatch
count.seqs(name=Undetermined_S0_L001_R1_001.trim.contigs.good.names, group=Undetermined_S0_L001_R1_001.contigs.good.groups, processors=1)
It still gives the same error.

I did a workaround suggested in Problem with mothur v 1.29.0 for windows?

mothur > list.seqs(name=Undetermined_S0_L001_R1_001.trim.contigs.good.names)
Output File Names:
Undetermined_S0_L001_R1_001.trim.contigs.good.accnos

mothur > get.seqs(accnos=current, group=Undetermined_S0_L001_R1_001.contigs.groups)

Using Undetermined_S0_L001_R1_001.trim.contigs.good.accnos as input file for the accnos parameter.
Selected 7993602 sequences from your group file.

Output File Names:
Undetermined_S0_L001_R1_001.contigs.pick.groups

mothur > screen.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.fasta, group=Undetermined_S0_L001_R1_001.contigs.pick.groups, summary=Undetermined_S0_L001_R1_001.trim.contigs.summary, maxambig=0, maxlength=257)

I still get the error
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10006_15423 please correct.
Your groupfile does not include the sequence M03051_6_000000000-AC2V1_1_1101_10006_23031 please correct.

When I queried the group files generated:
mothur > count.groups(group=Undetermined_S0_L001_R1_001.contigs.groups)

A1 contains 175519.
A10 contains 107627.
A11 contains 127124.
A12 contains 127727.
A2 contains 181646.
A3 contains 164725.
A4 contains 219817.
A5 contains 420866.
A6 contains 324406.
A7 contains 130909.
A8 contains 248926.
A9 contains 98600.
B1 contains 307046.
B10 contains 97105.
B11 contains 82127.
B12 contains 143565.
B2 contains 82031.
B3 contains 109294.
B4 contains 88667.
B5 contains 148898.
B6 contains 62795.
B7 contains 73033.
B8 contains 124.
B9 contains 92425.
C1 contains 120558.
C10 contains 78560.
C11 contains 137214.
C12 contains 131368.
C2 contains 147680.
C3 contains 173761.
C4 contains 127717.
C5 contains 157751.
C6 contains 117890.
C7 contains 59846.
C8 contains 62504.
C9 contains 94715.
D1 contains 105276.
D10 contains 78472.
D11 contains 81174.
D12 contains 195551.
D2 contains 131472.
D3 contains 61541.
D4 contains 61102.
D5 contains 121606.
D6 contains 42034.
D7 contains 82836.
D8 contains 94387.
D9 contains 62986.
E1 contains 149414.
E10 contains 170020.
E11 contains 150931.
E12 contains 198009.
E2 contains 150389.
E3 contains 196348.
E4 contains 140098.
E5 contains 133753.
E6 contains 203408.
E7 contains 149187.
E8 contains 142332.
E9 contains 96236.
F1 contains 164377.
F10 contains 184152.
F11 contains 169850.
F12 contains 140745.
F2 contains 131543.
F3 contains 145958.
F4 contains 151822.
F5 contains 161046.
F6 contains 129488.
F7 contains 244114.
F8 contains 156485.
F9 contains 254129.
G1 contains 176499.
G10 contains 89130.
G11 contains 532624.
G12 contains 198984.
G2 contains 176680.
G3 contains 181339.
G4 contains 157970.
G5 contains 137239.
G6 contains 201167.
G7 contains 243259.
G8 contains 69070.
G9 contains 161998.
H1 contains 107118.
H10 contains 364585.
H11 contains 262535.
H12 contains 165928.
H2 contains 220717.
H3 contains 133717.
H4 contains 181562.
H5 contains 162953.
H6 contains 249360.
H7 contains 264736.
H8 contains 154073.
H9 contains 230840.

Total seqs: 14880920.

Output File Names:
Undetermined_S0_L001_R1_001.contigs.count.summary

mothur > count.groups(group=Undetermined_S0_L001_R1_001.contigs.good.groups)

A1 contains 94068.
A10 contains 58989.
A11 contains 68181.
A12 contains 69512.
A2 contains 99415.
A3 contains 88985.
A4 contains 119585.
A5 contains 233709.
A6 contains 177970.
A7 contains 71037.
A8 contains 138254.
A9 contains 54571.
B1 contains 158910.
B10 contains 52786.
B11 contains 43907.
B12 contains 78172.
B2 contains 43382.
B3 contains 57264.
B4 contains 47689.
B5 contains 80975.
B6 contains 34066.
B7 contains 38237.
B8 contains 60.
B9 contains 48126.
C1 contains 64791.
C10 contains 39444.
C11 contains 74441.
C12 contains 71314.
C2 contains 81159.
C3 contains 92100.
C4 contains 70437.
C5 contains 82376.
C6 contains 64303.
C7 contains 31070.
C8 contains 32889.
C9 contains 50498.
D1 contains 54888.
D10 contains 43356.
D11 contains 44244.
D12 contains 106951.
D2 contains 70763.
D3 contains 33558.
D4 contains 33095.
D5 contains 66317.
D6 contains 22783.
D7 contains 45006.
D8 contains 50012.
D9 contains 33564.
E1 contains 80424.
E10 contains 92169.
E11 contains 79528.
E12 contains 110340.
E2 contains 80993.
E3 contains 103886.
E4 contains 72884.
E5 contains 69921.
E6 contains 108379.
E7 contains 81632.
E8 contains 76221.
E9 contains 52125.
F1 contains 85992.
F10 contains 93263.
F11 contains 93641.
F12 contains 74061.
F2 contains 69683.
F3 contains 78941.
F4 contains 83512.
F5 contains 88542.
F6 contains 69692.
F7 contains 135591.
F8 contains 81097.
F9 contains 134722.
G1 contains 93960.
G10 contains 48075.
G11 contains 286798.
G12 contains 106006.
G2 contains 90056.
G3 contains 95886.
G4 contains 85815.
G5 contains 70575.
G6 contains 107271.
G7 contains 131471.
G8 contains 35910.
G9 contains 83582.
H1 contains 58381.
H10 contains 199992.
H11 contains 145553.
H12 contains 88983.
H2 contains 118801.
H3 contains 70722.
H4 contains 95812.
H5 contains 89955.
H6 contains 134744.
H7 contains 140860.
H8 contains 76686.
H9 contains 123260.

Total seqs: 7993602.

Output File Names:
Undetermined_S0_L001_R1_001.contigs.good.count.summary

1 Like

Have you tried running this on our current version, 1.35.1, https://github.com/mothur/mothur/releases?

I am getting the same error and am working with the latest version 1.36.1.

mothur > count.seqs(name=Flowers_stability.trim.contigs.good.unique.good.filter.names, group=Flowers_stability.contigs.good.groups)

Using 1 processors.
[ERROR]: processes reported processing 199341 sequences, but group file indicates you have 2592229 sequences. Could you have a file mismatch?

When I later run the classify.seqs command, I receive the following error:

[ERROR]: M02127_127_000000000-AHV2V_1_1101_10002_25577 is not in your count table. Please correct.

I have gone through the similar troubleshooting process as described above.

Iā€™m new to mothur. Have I missed something?

Pat

This is a different question, please ask it on a new thread so we can keep things somewhat organized for future users.

Pat