Hello,
I am having trouble with my group file right at the beginning of the MiSeq SOP.
I am working with Illumina MiSeq V4 reads and I am using mothur v.1.36.1 on windows.
I made contigs using forward and reverse fastq files and when I attempted to screen sequences I got the following error message over and over again and about a third of my sequences are thrown out:
Your groupfile does not include the sequence please correct.
I then took a closer look at the group file (.contigs.groups) and it seems that some sequence names get cut off and as a result there is a shift and the following sequences won’t be recognized (at least that is what I am assuming)
here is my code:
mothur > make.contigs(file=test.files, processors=3)
Group count: V4_29 41611 V4_30 205103 V4_31 203345
Total of all groups is 450059
mothur > summary.seqs(fasta=test.trim.contigs.fasta, processors=3)
Using 3 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 296 296 0 3 1
2.5%-tile: 1 307 307 0 4 11252
25%-tile: 1 309 309 0 4 112515
Median: 1 310 310 0 4 225030
75%-tile: 1 310 310 1 5 337545
97.5%-tile: 1 317 317 17 7 438808
Maximum: 1 602 602 68 296 450059
Mean: 1 313.655 313.655 2.04722 4.53988
of Seqs: 450059
mothur > screen.seqs(fasta=test.trim.contigs.fasta, group=test.contigs.groups, summary=test.trim.contigs.summary, maxambig=0, minlength=307, maxlength=317)
Using 3 processors.
Your groupfile does not include the sequence M02973_22_000000000-AK66Y_1_2112_24715_6481 please correct.
Your groupfile does not include the sequence M02973_22_000000000-AK66Y_1_2112_25076_17339 please correct.
Your groupfile does not include the sequence M02973_22_000000000-AK66Y_1_2112_25077_13464 please correct.
Your groupfile does not include the sequence M02973_22_000000000-AK66Y_1_2112_25083_10665 please correct.
and so on…
mothur > summary.seqs(fasta=test.trim.contigs.good.fasta, processors=3)
Using 3 processors.
Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 307 307 0 3 1
2.5%-tile: 1 308 308 0 4 7301
25%-tile: 1 309 309 0 4 73008
Median: 1 310 310 0 4 146015
75%-tile: 1 310 310 0 5 219022
97.5%-tile: 1 312 312 0 6 284729
Maximum: 1 317 317 0 12 292029
Mean: 1 309.921 309.921 0 4.47276
of Seqs: 292029
I appreaciate any hints you can give me!
Thanks,
Julia