Issue with pre.cluster

Tubeman · September 29, 2023, 12:13pm

Hi,

I have an issue when running pre.cluster as it leads to an empty output file. I looked at the other topics about this which state that it is most likely a missing counts file but I checked my commands and unless I’m overlooking something, I think they are correct. The biggest difference is that I’m starting from a single sample so there is no counts file generated by make.contigs and only after I run unique.seqs for the first time. When I run the same analysis with Mothur v1.43.0, pre.cluster works without issues although it does seem to run a command get.seqs automatically which the one in v1.48.0 doesn’t.

Do you have any idea what is going on or what I’m missing?

The commands that I ran before pre.cluster are:

make.contigs(ffastq=SAMPLE.trimmed_reads_1P.fastq, rfastq=SAMPLE.trimmed_reads_2P.fastq, processors=8)
summary.seqs(fasta=current)
screen.seqs(fasta=current, processors=1, maxambig=5, maxhomop=15, maxlength=600)
unique.seqs(fasta=current)
pcr.seqs(fasta=silva.db.fasta, processors=8, keepdots=F, keepprimer=T, start=11895, end=25318)
align.seqs(fasta=SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.fasta, reference=silva.db.pcr.fasta, processors=8)
summary.seqs(fasta=current, count=current)
screen.seqs(fasta=current, count=current, processors=1, start=1968, end=11550)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=current, processors=8, trump=., vertical=T)
unique.seqs(fasta=current, count=current)

And the output from pre.cluster:

mothur > pre.cluster(fasta=current, count=current, processors=1, diffs=2)
Using SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.count_table as input file for the count parameter.
Using SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta as input file for the fasta parameter.

Using 1 processors.
1774    710     1064
Total number of sequences before precluster was 1774.
pre.cluster removed 1064 sequences.

/******************************************/
[WARNING]: SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta.

Output File Names:
SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.precluster.fasta

/******************************************/
Done.
It took 0 secs to cluster 1774 sequences.

Using 1 processors.

Output File Names:
SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.precluster.fasta
SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.precluster.count_table
SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.precluster.map

leocadio · October 2, 2023, 12:54pm

I am interested on this topic, since I am having the same issue in my run, also with 1.48.0. I thought I was doing something wrong. I am using unaligned sequences.

pschloss · October 2, 2023, 9:02pm

pre.cluster anticipates aligned sequences unless you provide other arguments. How are you running the command?

pschloss · October 2, 2023, 9:03pm

Could you possibly send me the count and fasta files you are inputting to pre.cluster? pschloss / umich edu

Pat

Tubeman · October 3, 2023, 5:57am

I sent the data to mothur.bugs@gmail.com.

leocadio · October 3, 2023, 1:23pm

Hi Pat
I used the align option

pre.cluster(fasta=current, count=current, align=needleman, processors=20)

from the mothur page for pre.cluster

### align

When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods - gotoh and needleman - needleman is the default setting:

```
mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=needleman)

pschloss · October 3, 2023, 1:48pm

leocadio - can you open a new thread to troubleshoot your specific problem?

Tubeman · October 11, 2023, 6:40am

Just posting again to keep this topic alive as I still have the issue.

pschloss · October 18, 2023, 2:48pm

Sorry for the delay Tubeman -

The problem appears to be due to you having no groups in your count file. Is that by design? It looks like you might be processing multiple values of SAMPLE in parallel and then intending to merge them at the end. Ideally, you’d do that in the make.contigs step because the alignments for each SAMPLE fasta file will be different between samples.

Regardless, if you want this to work you need to create a fake group column in your count file that has a name other than “total”. On a mac/linux you can do this with the following one liner…

awk '{print $0,$NF}' SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.count_table | sed "s/total total/total A/" > SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.mod.count_table

That should do the trick for you.
Pat

Tubeman · October 20, 2023, 12:09pm

Hi Pat

Thanks for the help, it works now. For this analysis, I’m working with a single sample so that is why I do not have groups.

Cheers
Raf

system · October 30, 2023, 12:09pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
An error occurs while running pre.cluster command Commands in mothur	7	731	February 13, 2023
Help in pre.cluster Commands in mothur	3	326	August 18, 2023
Pre.cluster removes the majority of sequences and names mismatch mothur bugs	2	754	July 5, 2021
Pre.cluster not working and quit mothur mothur bugs	6	1082	August 9, 2019
Pre.cluster is very lslow and the fasta file which it produce is blank	6	542	February 20, 2024

Issue with pre.cluster

Related topics