Clearcut Syntax Error: Distance Matrix Issue in

Jayalal · September 10, 2024, 12:45pm

Hi, How can I fix this?
Command runs in cluster computer with 400 GB RAM

mothur 1.48.0

mothur > clearcut(phylip=current)
Using stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.phylip.dist as input file for the phylip parameter.
Clearcut: Syntax error in distance matrix at offset 25.

pschloss · September 10, 2024, 1:36pm

Can you post the first 5 or so lines of the distance matrix?

Pat

Jayalal · September 10, 2024, 7:20pm

head stability.trim.contigs.good
.unique.good.filter.unique.precluster.denovo.vsearch.pick.phylip.dist
82017
852240700170067925
852240700166174793 0.2421
85224070017627201 0.1535 0.2126
852240700168816867 0.01632 0.2324 0.1419
852240700176135264 0.3083 0.2482 0.2749 0.2937
852240700176748695 0.2721 0.2475 0.2469 0.2598 0.285
852240700167327312 0.2028 0.2494 0.1837 0.1958 0.318 0.2843
852240700173014210 0.3382 0.2745 0.3007 0.3284 0.2143 0.2948 0.2941
85224070017397912 0.3286 0.2867 0.3138 0.331 0.3212 0.3056 0.3192 0.3325

make.shared(list=current, count=current, label=0.03);
classify.otu(list=current, count=current, taxonomy=current, label=0.03);
dist.seqs(fasta=current, output=lt);
clearcut(phylip=current)"

pschloss · September 10, 2024, 8:01pm

Can you tell me what running the following returns?

wc -l stability.trim.contigs.good
.unique.good.filter.unique.precluster.denovo.vsearch.pick.phylip.dist

Jayalal · September 11, 2024, 5:05am

wc -l stability.trim.contigs.goo
d.unique.good.filter.unique.precluster.denovo.vsearch.pick.phylip.dist

82018 stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.phylip.dist

westcott · September 11, 2024, 4:52pm

I am not able to reproduce the issue with the MiSeq_SOP dataset. The error message indicates there are junk / hidden characters in the first line or two of the file. Could the file have gotten corrupted? If you want to send the log file and distance matrix to mothur.westcott@gmail.com I can take a look for you.

Jayalal · September 11, 2024, 8:25pm

Thank you, I have sent an email

system · September 21, 2024, 8:25pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

westcott · September 23, 2024, 3:46pm

Thanks for sending your files. I was able to find the source of the issue. The clearcut command uses code originally developed by Luke Sheneman for the clearcut program. The code expects sequence names to include non numeric characters. Your sequence names are being misinterpreted as distances in the matrix and the command assumes a corrupted file at location 25 (the first sequence name). To resolve this issue you can rename your sequences to include non numeric characters. You can do this with the rename.seqs command as follows:

mothur > rename.seqs(fasta=current, count=current, delim="_”)

The above command will create sequence names like: number_sampleName. So a sequence like 852240700170067925 belonging to sample1 would become 1_sample1. The rename.seqs command creates a map file you can use to restore the original names. To restore the names to the originals, run the command below.

mothur > rename.seqs(fasta=current, map=mapFileCreatedByFirstRenameseqs)

pschloss · October 1, 2024, 5:57pm

Hi again - it looks like your distance matrix has 82k sequences in it and is using tons of RAM. It’s taking more than 24 hours to just read in the distance matrix. It’s likely just too big to be processed by clearcut. I’d strongly suggest you use an OTU or phylotype based approach instead. I have yet to find a case where an OTU-based approach using something like Bray-Curtis disagreed with one of the UniFrac commands.

Pat

Topic		Replies	Views
clearcut error and crashes mothur bugs	3	7199	March 9, 2014
Problem of Clearcut Commands in mothur	3	3838	March 24, 2016
clearcut Commands in mothur	1	1592	June 23, 2016
Problems with distance matrix mothur bugs	1	2122	June 16, 2015
Use of clearcut Commands in mothur	0	2843	July 2, 2010

Clearcut Syntax Error: Distance Matrix Issue in

Related topics