Issue with pre.cluster


I have an issue when running pre.cluster as it leads to an empty output file. I looked at the other topics about this which state that it is most likely a missing counts file but I checked my commands and unless I’m overlooking something, I think they are correct. The biggest difference is that I’m starting from a single sample so there is no counts file generated by make.contigs and only after I run unique.seqs for the first time. When I run the same analysis with Mothur v1.43.0, pre.cluster works without issues although it does seem to run a command get.seqs automatically which the one in v1.48.0 doesn’t.

Do you have any idea what is going on or what I’m missing?

The commands that I ran before pre.cluster are:

make.contigs(ffastq=SAMPLE.trimmed_reads_1P.fastq, rfastq=SAMPLE.trimmed_reads_2P.fastq, processors=8)
screen.seqs(fasta=current, processors=1, maxambig=5, maxhomop=15, maxlength=600)
pcr.seqs(fasta=silva.db.fasta, processors=8, keepdots=F, keepprimer=T, start=11895, end=25318)
align.seqs(fasta=SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.fasta, reference=silva.db.pcr.fasta, processors=8)
summary.seqs(fasta=current, count=current)
screen.seqs(fasta=current, count=current, processors=1, start=1968, end=11550)
summary.seqs(fasta=current, count=current)
filter.seqs(fasta=current, processors=8, trump=., vertical=T)
unique.seqs(fasta=current, count=current)

And the output from pre.cluster:

mothur > pre.cluster(fasta=current, count=current, processors=1, diffs=2)
Using SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.count_table as input file for the count parameter.
Using SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta as input file for the fasta parameter.

Using 1 processors.
1774    710     1064
Total number of sequences before precluster was 1774.
pre.cluster removed 1064 sequences.

[WARNING]: SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta does not contain any sequence from the .accnos file.
Selected 0 sequences from SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.unique.fasta.

Output File Names:

It took 0 secs to cluster 1774 sequences.

Using 1 processors.

Output File Names:
1 Like

I am interested on this topic, since I am having the same issue in my run, also with 1.48.0. I thought I was doing something wrong. I am using unaligned sequences.

pre.cluster anticipates aligned sequences unless you provide other arguments. How are you running the command?

Could you possibly send me the count and fasta files you are inputting to pre.cluster? pschloss / umich edu


I sent the data to

Hi Pat
I used the align option

pre.cluster(fasta=current, count=current, align=needleman, processors=20)

from the mothur page for pre.cluster

### align:link:

When using unaligned sequences, the pre.cluster command allows you to select between two alignment methods - gotoh and needleman - needleman is the default setting:

mothur > pre.cluster(fasta=sogin.unique.filter.unique.fasta, name=sogin.unique.filter.names, diffs=2, align=needleman)

leocadio - can you open a new thread to troubleshoot your specific problem?

Just posting again to keep this topic alive as I still have the issue.

Sorry for the delay Tubeman -

The problem appears to be due to you having no groups in your count file. Is that by design? It looks like you might be processing multiple values of SAMPLE in parallel and then intending to merge them at the end. Ideally, you’d do that in the make.contigs step because the alignments for each SAMPLE fasta file will be different between samples.

Regardless, if you want this to work you need to create a fake group column in your count file that has a name other than “total”. On a mac/linux you can do this with the following one liner…

awk '{print $0,$NF}' SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.count_table | sed "s/total total/total A/" > SAMPLE.trimmed_reads_1P.trim.contigs.good.unique.good.filter.mod.count_table

That should do the trick for you.

Hi Pat

Thanks for the help, it works now. For this analysis, I’m working with a single sample so that is why I do not have groups.


1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.