Screen.seqs generating weird result

After aligning my sequences to silva.seed_v138 I ran summary.seqs:

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.align, count=csp_bact.trim.contigs.good.count_table)

And the results looked fine :

Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	0	0	0	0	1	1
2.5%-tile:	13862	23444	252	0	3	170812
25%-tile:	13862	23444	253	0	4	1708117
Median: 	13862	23444	253	0	4	3416233
75%-tile:	13862	23444	253	0	5	5124349
97.5%-tile:	13862	23444	254	0	6	6661653
Maximum:	23484	23491	295	0	8	6832464
Mean:	13865	23407	252	0	4

# of unique seqs:	3423400

total # of seqs:	6832464

I then ran screen seqs:

mothur > screen.seqs(fasta=csp_bact.trim.contigs.good.unique.align, count=csp_bact.trim.contigs.good.count_table, summary=csp_bact.trim.contigs.good.unique.summary, start=13862, end=23444)

And the results did not make sense:

It took 169 secs to screen 3423400 sequences, removed 18745.

******************************************************************************
Running command: remove.seqs(accnos=csp_bact.trim.contigs.good.unique.bad.accnos.temp, count=csp_bact.trim.contigs.good.count_table)
Removed 0 sequences from csp_bact.trim.contigs.good.count_table.

Output File Names:
csp_bact.trim.contigs.good.pick.count_table

******************************************************************************

Output File Names:
csp_bact.trim.contigs.good.unique.good.summary
csp_bact.trim.contigs.good.unique.good.align
csp_bact.trim.contigs.good.unique.bad.accnos
csp_bact.trim.contigs.good.good.count_table

Specifically, the output file csp_bact.trim.contigs.good.good.count_table is empty, and the summary.seqs of the output files shows much lower total number of sequences and it looks like they have not been trimmed based on my start and end values.

summary.seqs(fasta=csp_bact.trim.contigs.good.unique.good.align, count=csp_bact.trim.contigs.good.good.count_table)
[ERROR]: csp_bact.trim.contigs.good.good.count_table is blank, aborting.

	Start	End	NBases	Ambigs	Polymer	NumSeqs
Minimum:	13835	23444	227	0	3	1
2.5%-tile:	13862	23444	252	0	3	11723
25%-tile:	13862	23444	253	0	4	117224
Median: 	13862	23444	253	0	4	234448
75%-tile:	13862	23444	253	0	5	351671
97.5%-tile:	13862	23444	254	0	6	457172
Maximum:	13862	23486	292	0	8	468894
Mean:	13861	23444	253	0	4

# of Seqs:	468894

It took 9 secs to summarize 468894 sequences.

I’ve run this multiple times and keep getting the same result. I’ve even gone back to the beginning and started the mothur analysis from scratch and the result is the same. Any suggestions would be appreciated.

I am happy to help.

Can you tell me what version of mothur you are using?

Also, the results of the first summary.seqs command look off. It appears to indicate you have sequences with a length of 0. Could you post the commands you ran before this issue?

We are using mothur v.1.48.0

mothur > make.file(inputdir=., type=fastq, prefix=csp_bact)

Output File Names:

csp_bact.files

mothur > make.contigs(file=csp_bact.files)

Group count:

3429 138741

3429_054 72943

3429_055 133730

3429_056 137291

3429_057 126135

3429_058 94121

3429_059 115847

3429_060 54472

3429_061 129139

3429_062 117871

3429_063 124400

3429_064 117099

3429_065 136891

3429_066 134079

3429_067 135028

3429_068 128333

3429_069 131371

3429_070 182815

3429_071 193017

3429_072 178557

3429_073 138175

3429_074 157246

3429_075 143929

3429_076 121179

3429_081 142772

3429_082 161533

3429_083 128257

3429_084 150879

3429_089 114588

3429_090 121028

3429_091 129672

3429_092 153985

3429_093 145101

3429_094 173435

3429_095 136221

3429_096 168714

3429_097 198212

3429_098 203622

3429_099 237756

3429_100 220755

3429_101 180031

3429_102 214197

3429_103 205486

3429_104 218382

3429b 80371

3429b_02 74021

3429b_03 77665

3429b_04 74341

3429b_05 80934

3429b_06 83707

3429b_07 68833

3429b_08 66747

Total of all groups is 7153654

It took 349 secs to process 7153654 sequences.

Output File Names:

csp_bact.trim.contigs.fasta

csp_bact.scrap.contigs.fasta

csp_bact.contigs_report

csp_bact.contigs.count_table

mothur > summary.seqs(fasta=csp_bact.trim.contigs.fasta)

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 309 309 0 3 1

2.5%-tile: 1 325 325 0 3 178842

25%-tile: 1 326 326 0 4 1788414

Median: 1 326 326 0 4 3576828

75%-tile: 1 326 326 0 5 5365241

97.5%-tile: 1 327 327 1 7 6974813

Maximum: 1 618 618 123 309 7153654

Mean: 1 327 327 0 5

of Seqs: 7153654

It took 39 secs to summarize 7153654 sequences.

Output File Names:

csp_bact.trim.contigs.summary

mothur > screen.seqs(fasta=csp_bact.trim.contigs.fasta, count=csp_bact.contigs.count_table, maxambig=0, maxlength=330, maxhomop=8)

Using 80 processors.

It took 48 secs to screen 7153654 sequences, removed 321190.

/******************************************/

Running command: remove.seqs(accnos=csp_bact.trim.contigs.bad.accnos.temp, count=csp_bact.contigs.count_table)

Removed 321190 sequences from csp_bact.contigs.count_table.

Output File Names:

csp_bact.contigs.pick.count_table

/******************************************/

Output File Names:

csp_bact.trim.contigs.good.fasta

csp_bact.trim.contigs.bad.accnos

csp_bact.contigs.good.count_table

It took 117 secs to screen 7153654 sequences.

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.fasta)

Using 80 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 309 309 0 3 1

2.5%-tile: 1 325 325 0 3 170812

25%-tile: 1 326 326 0 4 1708117

Median: 1 326 326 0 4 3416233

75%-tile: 1 326 326 0 5 5124349

97.5%-tile: 1 327 327 0 6 6661653

Maximum: 1 330 330 0 8 6832464

Mean: 1 325 325 0 4

of Seqs: 6832464

It took 63 secs to summarize 6832464 sequences.

Output File Names:

csp_bact.trim.contigs.good.summary

mothur > unique.seqs(fasta=csp_bact.trim.contigs.good.fasta, count= csp_bact.contigs.good.count_table)

6832464 3423400

Output File Names:

csp_bact.trim.contigs.good.unique.fasta

csp_bact.trim.contigs.good.count_table

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.fasta, count=csp_bact.trim.contigs.good.count_table)summary.seqs(fasta=csp_bact.trim.contigs.good.unique.fasta, count=csp_bact.trim.contigs.good.count_table)

Start End NBases Ambigs Polymer NumSeqs

Minimum: 1 309 309 0 3 1

2.5%-tile: 1 325 325 0 3 170812

25%-tile: 1 326 326 0 4 1708117

Median: 1 326 326 0 4 3416233

75%-tile: 1 326 326 0 5 5124349

97.5%-tile: 1 327 327 0 6 6661653

Maximum: 1 330 330 0 8 6832464

Mean: 1 325 325 0 4

of unique seqs: 3423400

total # of seqs: 6832464

It took 179 secs to summarize 6832464 sequences.

Output File Names:

csp_bact.trim.contigs.good.unique.summary

mothur > pcr.seqs(fasta=silva.seed_v138_2.align, oligos=oligos.txt)

Using 80 processors.

It took 0 secs to screen 8696 sequences.

Output File Names:

silva.seed_v138_2.pcr.align

silva.seed_v138_2.bad.accnos

silva.seed_v138_2.scrap.pcr.align

mothur > rename.file(input=silva.seed_v138_2.pcr.align, new=silva.v138.fasta)

mothur > summary.seqs(fasta=silva.v138.fasta)

Start End NBases Ambigs Polymer NumSeqs

Minimum: 13836 23444 230 0 3 1

2.5%-tile: 13862 23444 252 0 3 151

25%-tile: 13862 23444 253 0 4 1504

Median: 13862 23444 253 0 4 3007

75%-tile: 13862 23444 253 0 5 4510

97.5%-tile: 13862 23444 254 1 6 5862

Maximum: 13862 23491 311 4 10 6012

Mean: 13861 23444 252 0 4

of Seqs: 6012

It took 0 secs to summarize 6012 sequences.

Output File Names:

silva.v138.summary

mothur > align.seqs(fasta=csp_bact.trim.contigs.good.unique.fasta, reference=silva.v138.fasta)

Using 80 processors.

Reading in the silva.v138.fasta template sequences… DONE.

It took 4 to read 6012 sequences.

Aligning sequences from csp_bact.trim.contigs.good.unique.fasta …

It took 509 secs to align 3423400 sequences.

[WARNING]: 15996 of your sequences generated alignments that eliminated too many bases, a list is provided in csp_bact.trim.contigs.good.unique.flip.accnos.

[NOTE]: 8927 of your sequences were reversed to produce a better alignment.

It took 509 seconds to align 3423400 sequences.

Output File Names:

csp_bact.trim.contigs.good.unique.align

csp_bact.trim.contigs.good.unique.align_report

csp_bact.trim.contigs.good.unique.flip.accnos

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.align, count=csp_bact.trim.contigs.good.count_table)

Using 80 processors.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 0 0 0 0 1 1

2.5%-tile: 13862 23444 252 0 3 170812

25%-tile: 13862 23444 253 0 4 1708117

Median: 13862 23444 253 0 4 3416233

75%-tile: 13862 23444 253 0 5 5124349

97.5%-tile: 13862 23444 254 0 6 6661653

Maximum: 23484 23491 295 0 8 6832464

Mean: 13865 23407 252 0 4

of unique seqs: 3423400

total # of seqs: 6832464

It took 220 secs to summarize 6832464 sequences.

Output File Names:

csp_bact.trim.contigs.good.unique.summary

mothur > screen.seqs(fasta=csp_bact.trim.contigs.good.unique.align, count=csp_bact.trim.contigs.good.count_table, summary=csp_bact.trim.contigs.good.unique.summary, start=13862, end=23444)

It took 169 secs to screen 3423400 sequences, removed 18745.


Running command: remove.seqs(accnos=csp_bact.trim.contigs.good.unique.bad.accnos.temp, count=csp_bact.trim.contigs.good.count_table)

Removed 0 sequences from csp_bact.trim.contigs.good.count_table.

Output File Names:

csp_bact.trim.contigs.good.pick.count_table


Output File Names:

csp_bact.trim.contigs.good.unique.good.summary

csp_bact.trim.contigs.good.unique.good.align

csp_bact.trim.contigs.good.unique.bad.accnos

csp_bact.trim.contigs.good.good.count_table

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.good.align, count=csp_bact.trim.contigs.good.good.count_table)

[ERROR]: csp_bact.trim.contigs.good.good.count_table is blank, aborting.

Start End NBases Ambigs Polymer NumSeqs

Minimum: 13835 23444 227 0 3 1

2.5%-tile: 13862 23444 252 0 3 11723

25%-tile: 13862 23444 253 0 4 117224

Median: 13862 23444 253 0 4 234448

75%-tile: 13862 23444 253 0 5 351671

97.5%-tile: 13862 23444 254 0 6 457172

Maximum: 13862 23486 292 0 8 468894

Mean: 13861 23444 253 0 4

# of Seqs: 468894

It took 9 secs to summarize 468894 sequences

Thanks for posting the log file. I am unable to reproduce the screen.seqs error with our test files. I tried including blank sequences as well and still not a problem. It appears the remove.seqs command run by the screen.seqs command is failing to generate a correct count file. Can you try running the commands separately like this?

mothur > screen.seqs(fasta=csp_bact.trim.contigs.good.unique.align, summary=csp_bact.trim.contigs.good.unique.summary, start=13862, end=23444)

mothur > remove.seqs(accnos=csp_bact.trim.contigs.good.unique.bad.accnos, count=csp_bact.trim.contigs.good.count_table)

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.good.align, count=current)

I re-ran the summary seqs and then tried your suggestion:

mothur > summary.seqs(fasta=csp_bact.trim.contigs.good.unique.align, count=csp_bact.trim.contigs.good.count_table)

Using 80 processors.

	Start	End	NBases	Ambigs	Polymer	NumSeqs

Minimum: 0 0 0 0 1 1
2.5%-tile: 13862 23444 252 0 3 170812
25%-tile: 13862 23444 253 0 4 1708117
Median: 13862 23444 253 0 4 3416233
75%-tile: 13862 23444 253 0 5 5124349
97.5%-tile: 13862 23444 254 0 6 6661653
Maximum: 23484 23491 295 0 8 6832464
Mean: 13865 23407 252 0 4

of unique seqs: 3423400

total # of seqs: 6832464

It took 353 secs to summarize 6832464 sequences.

Output File Names:
csp_bact.trim.contigs.good.unique.summary

mothur > screen.seqs(fasta=csp_bact.trim.contigs.good.unique.align, summary=csp_bact.trim.contigs.good.unique.summary, start=13862, end=23444)

Using 80 processors.

It took 168 secs to screen 3423400 sequences, removed 227.

[ERROR]: found 3423400 sequences in your fasta file, and 42813 sequences in your summary file, quitting.

mothur > remove.seqs(accnos=csp_bact.trim.contigs.good.unique.bad.accnos, count=csp_bact.trim.contigs.good.count_table)
Removed 0 sequences from csp_bact.trim.contigs.good.count_table.

Output File Names:
csp_bact.trim.contigs.good.pick.count_table

Could you be running out of disk space for the output files?

ERROR]: found 3423400 sequences in your fasta file, and 42813 sequences in your summary file, quitting.

The above error indicates the summary file was not completely written to file.

That’s possible. I’ll check.
Thanks so much for all of the quick responses.
I really appreciate it.

1 Like

I moved the files to a larger cluster and re-ran the screen-seqs command and it worked! I guess that was the issue! Thank you so much for the help!

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.