Pre.cluster is very lslow and the fasta file which it produce is blank

DYH · April 23, 2023, 1:21am

mothur > pre.cluster(fasta=stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta,count=stability.trim.contigs.subsample.good.unique.good.filter.count_table, diffs=2)

Using 64 processors.
When using running without group information mothur can only use 1 processor, continuing.
1108681 505548 603133
Total number of sequences before precluster was 1108681.
pre.cluster removed 603133 sequences.

/******************************************/
[WARNING]: stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from stability.trim.contigs.subsample.good.unique.good.filter.unique.fasta.

Output File Names:
stability.trim.contigs.subsample.good.unique.good.filter.unique.precluster.fasta

I had the above error while running pre.cluster and the size of the generated fasta file was 0, so I could not proceed to the next step

pschloss · April 24, 2023, 4:19pm

It looks like mothur is only seeing one group in your data. Normally it operates on each group separately and then pools the results at the end. I suspect you might be running into memory issues or something else weird. Do you mean to only have one group?

Pat

Somebodyatthedoor · May 2, 2023, 12:59pm

Hi Pat,

I am having the same problem. I am using the conda version of mothur (v1.48.0).

My data should have groups, but this info seems to be lost at some point during my pipeline. After make.contigs I get a count of all of the groups, so the input to that is fine. However, in my previous analyses make.contigs used to output samples.contigs.groups file, which I noticed it did not this time. Usually I would use this file as input for commands such as count.seqs but I haven’t been able to this time. So, I tried following the newest version of the Miseq SOP, which doesn’t seem to use any group files as input to commands.

The commands I have ran are:
make.contigs(file=samples.txt, processors=16)
summary.seqs(fasta=samples.trim.contigs.fasta, processors=16)
screen.seqs(fasta=samples.trim.contigs.fasta, count=samples.contigs.count_table, maxambig=0, maxlength=254, processors=16)
unique.seqs(fasta=samples.trim.contigs.good.fasta)
summary.seqs(count=samples.trim.contigs.good.count_table)
align.seqs(fasta=samples.trim.contigs.good.unique.fasta, reference=references/silva.nr_v138_1.align, processors=16)
summary.seqs(fasta=samples.trim.contigs.good.unique.align, count=samples.trim.contigs.good.count_table, processors=16)
screen.seqs(fasta=samples.trim.contigs.good.unique.align, count=samples.trim.contigs.good.count_table, summary=samples.trim.contigs.good.unique.summary, start=13862, end=22107, maxhomop=9)
summary.seqs(fasta=samples.trim.contigs.good.unique.good.align, count=samples.trim.contigs.good.good.count_table)
filter.seqs(fasta=samples.trim.contigs.good.unique.good.align, vertical=T, trump=., processors=16)
unique.seqs(fasta=samples.trim.contigs.good.unique.good.filter.fasta, count=samples.trim.contigs.good.good.count_table)
summary.seqs(fasta=samples.trim.contigs.good.unique.good.filter.unique.fasta, count=samples.trim.contigs.good.unique.good.filter.count_table, processors=16)
pre.cluster(fasta=samples.trim.contigs.good.unique.good.filter.unique.fasta, count=samples.trim.contigs.good.unique.good.filter.count_table, diffs=2, processors=16)
chimera.vsearch(fasta=samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta, count=samples.trim.contigs.good.unique.good.filter.unique.precluster.count_table, dereplicate=t, processors=16)

The output of pre.cluster is:

Total number of sequences before precluster was 244215.
pre.cluster removed 191544 sequences.

/******************************************/
[WARNING]: samples.trim.contigs.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from samples.trim.contigs.good.unique.good.filter.unique.fasta.

Output File Names:
samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta

/******************************************/
Done.
It took 455 secs to cluster 244215 sequences.

Using 16 processors.

Output File Names:
samples.trim.contigs.good.unique.good.filter.unique.precluster.fasta
samples.trim.contigs.good.unique.good.filter.unique.precluster.count_table
samples.trim.contigs.good.unique.good.filter.unique.precluster.map

Thanks for your help,
Laura

Somebodyatthedoor · May 2, 2023, 1:20pm

Hi Pat,

Sorry please ignore my last post. As usual, as soon as you post a query online you find the answer yourself… Looks like I’d forgot to input a count table file to unique.seqs.

Cheers,
Laura

tripitakit · November 30, 2023, 11:17am

I just did the same mistake, thank you for the time saving solution

DYH · February 15, 2024, 3:31pm

I would like to know how you specifically solved this problem

pschloss · February 20, 2024, 4:41pm

You should be getting a count_table file from make.contigs or from unique.seqs

Pat

Topic		Replies	Views
Help in pre.cluster Commands in mothur	3	325	August 18, 2023
An error occurs while running pre.cluster command Commands in mothur	7	721	February 13, 2023
Issue with pre.cluster Commands in mothur	10	521	October 30, 2023
pre.cluster problem mothur bugs	3	5387	October 20, 2014
pre.cluster error: removing group mothur bugs	7	4010	February 18, 2015

Pre.cluster is very lslow and the fasta file which it produce is blank

Related topics