Problem with MiSeq SOP pre.cluster

Hi everybody!

We recently switched from 454 pyrosequencing to MiSeq for our bacterial community analysis (our sequencing company of choice didn’t provide 454 any more…), and we ran into some problems during the analyis where I would like to get your opinion on.

We splitted our dataset into smaller parts we wanted to compare to ease the load on our computers. Now, funnily, some of the individual packages run straight through our modified MiSeQ SOP (see below), and some always stop during pre.cluster and tell us the following. I think the problem lies within filter.seqs or unique.seqs, as the number of unique sequences is reduced to 1… (see 2. post for full .log file)

"mothur > pre.cluster(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.count_table, diffs=4)

Using 4 processors.

Processing group BL_IT_North_TP1_SED:

Processing group BL_IT_North_TP2_WC30:

Processing group BL_IT_North_TP4_WC30:

Processing group KE_IT_North_TP2_WC30:
Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.



[ERROR]: process 0 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: process 1 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: process 2 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.names is blank. Please correct. [ERROR]: D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta is blank. Please correct."

As I said, it runs smoothly when just changing the set.dir and XXXXXX.files for the other parts of the analyses.

Our SOP is as follows:

D:/Phd work/Mothur/mothergui/mothur/mothur IslandgradientN.batch

set.dir(input=D:/Phd work/Mothur/IslandgradientN/)

make.contigs(file=IslandgradientN.files, processors=4)

summary.seqs(fasta=IslandgradientN.trim.contigs.fasta)

screen.seqs(fasta=IslandgradientN.trim.contigs.fasta, group=IslandgradientN.contigs.groups, maxambig=2, maxlength=450, minlength=350)

unique.seqs(fasta=IslandgradientN.trim.contigs.good.fasta)

count.seqs(name=IslandgradientN.trim.contigs.good.names, group=IslandgradientN.contigs.good.groups)

pcr.seqs(fasta=silva.bacteria.fasta, start=3000, end=25000, keepdots=F)

align.seqs(fasta=IslandgradientN.trim.contigs.good.unique.fasta, reference=silva.bacteria.pcr.fasta)

summary.seqs(fasta=current, count=current)

screen.seqs(fasta=IslandgradientN.trim.contigs.good.unique.align, count=IslandgradientN.trim.contigs.good.count_table, summary=current, optimize=start-end, maxhomop=8)

filter.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.align, vertical=T, trump=.)

unique.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.fasta, count=IslandgradientN.trim.contigs.good.good.count_table)

pre.cluster(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.count_table, diffs=4)

chimera.uchime(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)

remove.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos, name=IslandgradientN.trim.contigs.good.names)

classify.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=silva.bacteria.fasta, taxonomy=silva.bacteria.rdp.tax, cutoff=80)

remove.lineage(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=silva.bacteria.rdp.tax, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

cluster.split(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)

make.shared(list=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)

classify.otu(list=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, label=0.03)


Any help on this is greatly appreciated! Cheers! Hauke

Sorry for the SPAM, I decided to post the LOG file starting after the make.contigs. Maybe it helps you to identify the problem. The more I look at it, the more I am convinced the error lies in the filter.seqs or the subsequent unique.seqs, as there is only 1(!?!) unique sequence left…


mothur > summary.seqs(fasta=IslandgradientN.trim.contigs.fasta)

Using 4 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: 1 299 299 0 3 1
2.5%-tile: 1 405 405 0 4 19948
25%-tile: 1 442 442 0 5 199480
Median: 1 444 444 1 6 398960
75%-tile: 1 467 467 3 6 598439
97.5%-tile: 1 468 468 16 6 777971
Maximum: 1 616 616 194 285 797918
Mean: 1 450.654 450.654 2.51212 5.42134

of Seqs: 797918

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.summary

It took 16 secs to summarize 797918 sequences.

mothur > screen.seqs(fasta=IslandgradientN.trim.contigs.fasta, group=IslandgradientN.contigs.groups, maxambig=2, maxlength=450, minlength=350)

Using 4 processors.

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.fasta
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.bad.accnos
D:/Phd work/Mothur/IslandN/\IslandgradientN.contigs.good.groups


It took 21 secs to screen 797918 sequences.

mothur > unique.seqs(fasta=IslandgradientN.trim.contigs.good.fasta)
361192 289150

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.names
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.fasta


mothur > count.seqs(name=IslandgradientN.trim.contigs.good.names, group=IslandgradientN.contigs.good.groups)

Using 4 processors.
It took 7 secs to create a table for 361192 sequences.


Total number of sequences: 361192

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.count_table


mothur > pcr.seqs(fasta=silva.bacteria.fasta, start=3000, end=25000, keepdots=F)

Using 4 processors.

Output File Names:
D:/Phd work/Mothur/IslandN/\silva.bacteria.pcr.fasta


It took 31 secs to screen 14956 sequences.

mothur > align.seqs(fasta=IslandgradientN.trim.contigs.good.unique.fasta, reference=silva.bacteria.pcr.fasta)

Using 4 processors.

Reading in the D:/Phd work/Mothur/IslandN/\silva.bacteria.pcr.fasta template sequences… DONE.
It took 13 to read 14956 sequences.
Aligning sequences from D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.fasta …

Reading in the D:/Phd work/Mothur/IslandN/\silva.bacteria.pcr.fasta template sequences…

Reading in the D:/Phd work/Mothur/IslandN/\silva.bacteria.pcr.fasta template sequences… Reading in the D:/Phd work/Mothur/IslandN/\silva.bacteria.pcr.fasta template sequences… DONE.
It took 29 to read 14956 sequences.
DONE.
It took 29 to read 14956 sequences.
DONE.
It took 29 to read 14956 sequences.
Some of you sequences generated alignments that eliminated too many bases, a list is provided in D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.flip.accnos. If you set the flip parameter to true mothur will try aligning the reverse compliment as well.
It took 1114 secs to align 289150 sequences.


Output File Names: D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.align D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.align.report D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.flip.accnos
mothur > summary.seqs(fasta=current, count=current) Using D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.count_table as input file for the count parameter. Using D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.align as input file for the fasta parameter.

Using 4 processors.

Start End NBases Ambigs Polymer NumSeqs
Minimum: -1 -1 0 0 1 1
2.5%-tile: 103 106 2 0 1 9030
25%-tile: 103 106 2 0 1 90299
Median: 3387 20965 429 0 4 180597
75%-tile: 3387 20965 432 0 6 270895
97.5%-tile: 19580 20965 434 2 6 352163
Maximum: 20965 20966 449 2 13 361192
Mean: 3809.07 12963 224.451 0.230755 3.59786

of unique seqs: 289150

total # of seqs: 361192

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.summary

It took 419 secs to summarize 361192 sequences.

mothur > screen.seqs(fasta=IslandgradientN.trim.contigs.good.unique.align, count=IslandgradientN.trim.contigs.good.count_table, summary=current, optimize=start-end, maxhomop=8)
Using D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.summary as input file for the summary parameter.

Using 4 processors.
Optimizing start to 19542.
Optimizing end to 106.

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.summary
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.align
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.bad.accnos
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.good.count_table


It took 633 secs to screen 289150 sequences.

mothur > filter.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.align, vertical=T, trump=.)

Using 4 processors.
Creating Filter…


Running Filter...

Length of filtered alignment: 0 Number of columns removed: 22000 Length of the original alignment: 22000 Number of sequences used to construct filter: 269083

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.filter
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.fasta


mothur > unique.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.fasta, count=IslandgradientN.trim.contigs.good.good.count_table) 269083 1

Output File Names:
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.count_table
D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.fasta


mothur > pre.cluster(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.count_table, diffs=4)

Using 4 processors.

Processing group BL_IT_North_TP1_SED:

Processing group BL_IT_North_TP2_WC30:

Processing group BL_IT_North_TP4_WC30:

Processing group KE_IT_North_TP2_WC30:
Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.Error: diffs is greater than your sequence length.



[ERROR]: process 0 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: process 1 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: process 2 only processed 1 of 6 groups assigned to it, quitting. [ERROR]: D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.names is blank. Please correct. [ERROR]: D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta is blank. Please correct.

mothur > chimera.uchime(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta. It will be disregarded.
[ERROR]: no valid files.
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.count_table. It will be disregarded.
[ERROR]: You don’t have any saved reference sequences and the reference parameter is a required.

Using 4 processors.
[ERROR]: did not complete chimera.uchime.

mothur > remove.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta, accnos=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos, name=IslandgradientN.trim.contigs.good.names)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.accnos
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.fasta
[ERROR]: did not complete remove.seqs.

mothur > classify.seqs(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=silva.bacteria.fasta, taxonomy=silva.bacteria.rdp.tax, cutoff=80)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta. It will be disregarded.
no valid files.
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table. It will be disregarded.

Using 4 processors.
[ERROR]: did not complete classify.seqs.

mothur > remove.lineage(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=silva.bacteria.rdp.tax, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table
[ERROR]: did not complete remove.lineage.

mothur > cluster.split(fasta=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy
Using D:/Phd work/Mothur/IslandN/\silva.bacteria.rdp.tax as input file for the taxonomy parameter.
Using D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.names as input file for the name parameter.

Using 4 processors.
Using splitmethod classify.
[ERROR]: did not complete cluster.split.

mothur > make.shared(list=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, label=0.03)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table
No valid current files. You must provide a list or biom file before you can use the make.shared command.
[ERROR]: did not complete make.shared.

mothur > classify.otu(list=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list, count=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table, taxonomy=IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, label=0.03)
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.list
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy
reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.
Unable to open D:/Phd work/Mothur/IslandN/\IslandgradientN.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.pick.count_table
[ERROR]: did not complete classify.otu.

mothur > quit()

Your problem is occurring when you screen your seqs. The start and end positions are switched. Thus all the sequences that aligned are being eliminated, and only the sequences that did not align (or aligned to a very small region on the ends) are being retained.

Try optimizing the start position and end positions separately. Either that or put in the actual positions instead of using the optimize option: start=106, end=19542