mothur

Problem with screen.seqs (count and fasta file mismatch)

Hi
I’m trying to follow the MiSeq SOP and got an error msg running summary.seqs after screen.seqs.
I am not sure I did things right. Could you please help me get over this so I can go forward?
I copy below the log file, with the error message at the end.
Thank you!

mothur > 
 make.file (inputdir=., type=fastq, prefix=stability)
Setting input directory to: C:\mothur\
[ERROR]: Invalid command.
[ERROR]: did not complete make.file .

mothur > 
 make.file (inputdir=., type=fastq, prefix=stability)
Setting input directory to: C:\mothur\
[ERROR]: Invalid command.
[ERROR]: did not complete make.file .

mothur > 
make.file(inputdir=., type=fastq, prefix=stability)
Setting input directory to: C:\mothur\

Output File Names: 
C:\mothur\stability.files


mothur > 
make.contigs(file=stability.files, processors=4)

Using 4 processors.

>>>>>	Processing file pair C:\mothur\13_S5_L001_R1_001.FASTQ - C:\mothur\13_S5_L001_R2_001.FASTQ (files 1 of 12)	<<<<<
Making contigs...
Done.

It took 410 secs to assemble 464110 reads.


>>>>>	Processing file pair C:\mothur\14_S35_L001_R1_001.FASTQ - C:\mothur\14_S35_L001_R2_001.FASTQ (files 2 of 12)	<<<<<
Making contigs...
Done.

It took 209 secs to assemble 235161 reads.


>>>>>	Processing file pair C:\mothur\15_S36_L001_R1_001.FASTQ - C:\mothur\15_S36_L001_R2_001.FASTQ (files 3 of 12)	<<<<<
Making contigs...
Done.

It took 245 secs to assemble 273067 reads.


>>>>>	Processing file pair C:\mothur\16_S6_L001_R1_001.FASTQ - C:\mothur\16_S6_L001_R2_001.FASTQ (files 4 of 12)	<<<<<
Making contigs...
Done.

It took 295 secs to assemble 335804 reads.


>>>>>	Processing file pair C:\mothur\3_S1_L001_R1_001.FASTQ - C:\mothur\3_S1_L001_R2_001.FASTQ (files 5 of 12)	<<<<<
Making contigs...
Done.

It took 426 secs to assemble 472262 reads.


>>>>>	Processing file pair C:\mothur\4_S2_L001_R1_001.FASTQ - C:\mothur\4_S2_L001_R2_001.FASTQ (files 6 of 12)	<<<<<
Making contigs...
Done.

It took 328 secs to assemble 377911 reads.


>>>>>	Processing file pair C:\mothur\5_S3_L001_R1_001.FASTQ - C:\mothur\5_S3_L001_R2_001.FASTQ (files 7 of 12)	<<<<<
Making contigs...
Done.

It took 337 secs to assemble 382097 reads.


>>>>>	Processing file pair C:\mothur\6_S33_L001_R1_001.FASTQ - C:\mothur\6_S33_L001_R2_001.FASTQ (files 8 of 12)	<<<<<
Making contigs...
Done.

It took 283 secs to assemble 319146 reads.


>>>>>	Processing file pair C:\mothur\7_S4_L001_R1_001.FASTQ - C:\mothur\7_S4_L001_R2_001.FASTQ (files 9 of 12)	<<<<<
Making contigs...
Done.

It took 206 secs to assemble 232924 reads.


>>>>>	Processing file pair C:\mothur\SP01_S102_L001_R1_001.FASTQ - C:\mothur\SP01_S102_L001_R2_001.FASTQ (files 10 of 12)	<<<<<
Making contigs...
Done.

It took 88 secs to assemble 116068 reads.


>>>>>	Processing file pair C:\mothur\SP02_S103_L001_R1_001.FASTQ - C:\mothur\SP02_S103_L001_R2_001.FASTQ (files 11 of 12)	<<<<<
Making contigs...
Done.

It took 92 secs to assemble 120246 reads.


>>>>>	Processing file pair C:\mothur\SP08_S104_L001_R1_001.FASTQ - C:\mothur\SP08_S104_L001_R2_001.FASTQ (files 12 of 12)	<<<<<
Making contigs...
Done.

It took 85 secs to assemble 109512 reads.


Group count: 
7A_11	116068
7A_12	377911
7B_11	120246
7B_12	382097
7C_11	472262
7C_12	319146
7E_12	232924
7G_12	109512
P2A_14	464110
P2B_14	235161
P2C_14	273067
P2D_14	335804

Total of all groups is 3438308

It took 3034 secs to process 3438308 sequences.

Output File Names: 
C:\mothur\stability.trim.contigs.fasta
C:\mothur\stability.trim.contigs.qual
C:\mothur\stability.scrap.contigs.fasta
C:\mothur\stability.scrap.contigs.qual
C:\mothur\stability.contigs.report
C:\mothur\stability.contigs.groups


mothur > 
summary.seqs(fasta=stability.trim.contigs.fasta)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	35	35	0	3	1
2.5%-tile:	1	439	439	0	4	85958
25%-tile:	1	440	440	0	4	859578
Median: 	1	445	445	1	5	1719155
75%-tile:	1	464	464	6	6	2578732
97.5%-tile:	1	469	469	23	7	3352351
Maximum:	1	602	602	260	300	3438308
Mean:	1	451	451	4	5
# of Seqs:	3438308

It took 92 secs to summarize 3438308 sequences.

Output File Names:
C:\mothur\stability.trim.contigs.summary

mothur > 
screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, summary=stability.trim.contigs.summary, maxambig=0, minlength=439, maxlength=469, qfile=stability.trim.contigs.qual)

Using 4 processors.

It took 58 secs to screen 3438308 sequences, removed 2250012.

/******************************************/
Running command: remove.seqs(accnos=stability.trim.contigs.bad.accnos.temp, group=stability.contigs.groups, qfile=stability.trim.contigs.qual)
Removed 2250012 sequences from your group file.
Removed 2250012 sequences from your quality file.

Output File Names: 
stability.contigs.pick.groups
stability.trim.contigs.pick.qual

/******************************************/

Output File Names:
stability.trim.contigs.good.summary
stability.trim.contigs.good.fasta
stability.trim.contigs.bad.accnos
stability.contigs.good.groups
stability.trim.contigs.good.qual


It took 247 secs to screen 3438308 sequences.

mothur > 
summary.seqs(count=stability.trim.contigs.good.count_table)
Using stability.trim.contigs.good.unique.fasta as input file for the fasta parameter.

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	439	439	0	3	1
2.5%-tile:	1	440	440	0	4	29708
25%-tile:	1	441	441	0	4	297075
Median: 	1	445	445	0	5	594149
75%-tile:	1	465	465	0	6	891223
97.5%-tile:	1	469	469	0	6	1158589
Maximum:	1	469	469	0	177	1188296
Mean:	1	451	451	0	5
# of unique seqs:	786570
total # of seqs:	1188296

It took 22 secs to summarize 1188296 sequences.

Output File Names:
stability.trim.contigs.good.unique.summary

mothur > 
align.seqs(fasta=ecoliv3v4.fasta, reference=silva.seed_v132.align)

Using 4 processors.

Reading in the silva.seed_v132.align template sequences...	DONE.
It took 19 to read  11180 sequences.

Aligning sequences from ecoliv3v4.fasta ...
Reducing processors to 1.
It took 0 secs to align 1 sequences.


It took 0 seconds to align 1 sequences.

Output File Names: 
ecoliv3v4.align
ecoliv3v4.align.report


mothur > 
summary.seqs(fasta=ecoliv3v4.align)

Using 4 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	6388	25316	465	0	6	1
2.5%-tile:	6388	25316	465	0	6	1
25%-tile:	6388	25316	465	0	6	1
Median: 	6388	25316	465	0	6	1
75%-tile:	6388	25316	465	0	6	1
97.5%-tile:	6388	25316	465	0	6	1
Maximum:	6388	25316	465	0	6	1
Mean:	6388	25316	465	0	6
# of Seqs:	1

It took 0 secs to summarize 1 sequences.

Output File Names:
ecoliv3v4.summary

mothur > 
pcr.seqs(fasta=silva.seed_v132.align, start=6388, end=25316, keepdots=F, processors=8)

Using 8 processors.
[NOTE]: no sequences were bad, removing silva.seed_v132.bad.accnos

It took 17 secs to screen 11180 sequences.

Output File Names: 
silva.seed_v132.pcr.align


mothur > 
rename.file(input=silva.seed_v132.pcr.align, new=silva.v3v4.fasta)

Current files saved by mothur:
fasta=silva.seed_v132.pcr.align
processors=8
summary=ecoliv3v4.summary

mothur > 
summary.seqs(fasta=silva.v3v4.fasta)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	17056	415	0	3	1
2.5%-tile:	1	18928	438	0	4	280
25%-tile:	1	18928	444	0	4	2796
Median: 	1	18928	464	0	5	5591
75%-tile:	1	18928	465	0	6	8386
97.5%-tile:	1	18928	607	1	6	10901
Maximum:	3	18928	1274	5	12	11180
Mean:	1	18927	488	0	5
# of Seqs:	11180

It took 5 secs to summarize 11180 sequences.

Output File Names:
silva.v3v4.summary


mothur > 
align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.v3v4.fasta)

Using 8 processors.

Reading in the silva.v3v4.fasta template sequences...	DONE.
It took 8 to read  11180 sequences.

Aligning sequences from stability.trim.contigs.good.unique.fasta ...
It took 1493 secs to align 786570 sequences.

[WARNING]: 128 of your sequences generated alignments that eliminated too many bases, a list is provided in stability.trim.contigs.good.unique.flip.accnos.
[NOTE]: 119 of your sequences were reversed to produce a better alignment.

It took 1498 seconds to align 786570 sequences.

Output File Names: 
stability.trim.contigs.good.unique.align
stability.trim.contigs.good.unique.align.report
stability.trim.contigs.good.unique.flip.accnos


mothur > 
summary.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table)

Using 8 processors.

		Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	1	17	3	0	1	1
2.5%-tile:	1	18928	439	0	4	29708
25%-tile:	1	18928	439	0	4	297075
Median: 	1	18928	444	0	5	594149
75%-tile:	1	18928	464	0	6	891223
97.5%-tile:	1	18928	465	0	6	1158589
Maximum:	18910	18928	468	0	175	1188296
Mean:	1	18926	449	0	5
# of unique seqs:	786570
total # of seqs:	1188296

It took 516 secs to summarize 1188296 sequences.

Output File Names:
stability.trim.contigs.good.unique.summary


mothur > 
screen.seqs(fasta=stability.trim.contigs.good.unique.align, count=stability.trim.contigs.good.count_table, summary=stability.trim.contigs.good.unique.summary, start=1, end=18928, maxhomop=8)

Using 8 processors.

It took 419 secs to screen 786570 sequences, removed 8994.

/******************************************/
Running command: remove.seqs(accnos=stability.trim.contigs.good.unique.bad.accnos.temp, count=stability.trim.contigs.good.count_table)
Removed 9052 sequences from your count file.

Output File Names: 
stability.trim.contigs.good.pick.count_table

/******************************************/

Output File Names:
stability.trim.contigs.good.unique.good.summary
stability.trim.contigs.good.unique.good.align
stability.trim.contigs.good.unique.bad.accnos
stability.trim.contigs.good.good.count_table


It took 652 secs to screen 786570 sequences.

mothur > 
summary.seqs(fasta=current, count=current)
Using stability.trim.contigs.good.good.count_table as input file for the count parameter.
Using stability.trim.contigs.good.unique.good.align as input file for the fasta parameter.

Using 8 processors.
[ERROR]: Your count file contains 777576 unique sequences, but your fasta file contains 604417. File mismatch detected, quitting command.

mothur > 
quit

Could you send the log file, as well as the stability.trim.contigs.good.unique.align and stability.trim.contigs.good.count_table files to mothur.bugs@gmail.com so I can take a closer look?

Hi, I am trying to upload the files to my Drive but they are too big, the .align file is 11 gb!! Surely due to a mistake? I used the silva seed file screened to my primers positions, and I ended up with such a big align file. The logfile is what I pasted above, as soon as the big file uploads to drive I will send the email from my gmail account with the subject of this thread so you can recongnize it.


Thank you!! I need to move forward with this and cannot find what I did wrong :frowning_face:
Susi

Can you zip it and then try uploading it?

Hi Pat! Sarah answered my email and found the problem, thanks to her! I solved it.
I have another issue with a different sequencing provider, but openened a different question asking how to run make.contigs with index and oligos files. Hope I can go further with both analysis soon! Thank you!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.