Pre.cluster is very lslow and the fasta file which it produce is blank

mothur > pre.cluster(fasta=stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta,count=stability.trim.contigs.subsample.good.unique.good.filter.count_table, diffs=2)

Using 64 processors.
When using running without group information mothur can only use 1 processor, continuing.
1108681 505548 603133
Total number of sequences before precluster was 1108681.
pre.cluster removed 603133 sequences.

/******************************************/
[WARNING]: stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta.

Output File Names:
stability.trim.contigs.subsample.good.unique.good.filter.unique.precluster.fasta

I had the above error while running pre.cluster and the size of the generated fasta file was 0, so I could not proceed to the next step

It looks like mothur is only seeing one group in your data. Normally it operates on each group separately and then pools the results at the end. I suspect you might be running into memory issues or something else weird. Do you mean to only have one group?

Pat

Hi Pat,

I am having the same problem. I am using the conda version of mothur (v1.48.0).

My data should have groups, but this info seems to be lost at some point during my pipeline. After make.contigs I get a count of all of the groups, so the input to that is fine. However, in my previous analyses make.contigs used to output samples.contigs.groups file, which I noticed it did not this time. Usually I would use this file as input for commands such as count.seqs but I haven’t been able to this time. So, I tried following the newest version of the Miseq SOP, which doesn’t seem to use any group files as input to commands.

The commands I have ran are:
make.contigs(file=samples.txt, processors=16)
summary.seqs(fasta=samples.trim.contigs.fasta, processors=16)
screen.seqs(fasta=samples.trim.contigs.fasta, count=samples.contigs.count_table, maxambig=0, maxlength=254, processors=16)
unique.seqs(fasta=samples.trim.contigs.good.fasta)
summary.seqs(count=samples.trim.contigs.good.count_table)
align.seqs(fasta=samples.trim.contigs.good.unique.fasta, reference=references/silva.nr_v138_1.align, processors=16)
summary.seqs(fasta=samples.trim.contigs.good.unique.align, count=samples.trim.contigs.good.count_table, processors=16)
screen.seqs(fasta=samples.trim.contigs.good.unique.align, count=samples.trim.contigs.good.count_table, summary=samples.trim.contigs.good.unique.summary, start=13862, end=22107, maxhomop=9)
summary.seqs(fasta=samples.trim.contigs.good.unique.good.align, count=samples.trim.contigs.good.good.count_table)
filter.seqs(fasta=samples.trim.contigs.good.unique.good.align, vertical=T, trump=., processors=16)
unique.seqs(fasta=samples.trim.contigs.good.unique.good.filter.fasta, count=samples.trim.contigs.good.good.count_table)
summary.seqs(fasta=samples.trim.contigs.good.unique.good.filter.unique.fasta, count=samples.trim.contigs.good.unique.good.filter.count_table, processors=16)
pre.cluster(fasta=samples.trim.contigs.good.unique.good.filter.unique.fasta, count=samples.trim.contigs.good.unique.good.filter.count_table, diffs=2, processors=16)
chimera.vsearch(fasta=samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=samples.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=16)

The output of pre.cluster is:

Total number of sequences before precluster was 244215.
pre.cluster removed 191544 sequences.

/******************************************/
[WARNING]: samples.trim.contigs.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from samples.trim.contigs.good.unique.good.filter.unique.fasta.

Output File Names:
samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta

/******************************************/
Done.
It took 455 secs to cluster 244215 sequences.

Using 16 processors.

Output File Names:
samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta
samples.trim.contigs.good.unique.good.filter.unique.precluster.count_table
samples.trim.contigs.good.unique.good.filter.unique.precluster.map

Thanks for your help,
Laura

Hi Pat,

Sorry please ignore my last post. As usual, as soon as you post a query online you find the answer yourself… Looks like I’d forgot to input a count table file to unique.seqs.

Cheers,
Laura

2 Likes

I just did the same mistake, thank you for the time saving solution :bowing_man:

I would like to know how you specifically solved this problem

You should be getting a count_table file from make.contigs or from unique.seqs

Pat