error message when running cluster.split with file option

aduarte · June 26, 2015, 2:30pm

Hi,

Not sure if this is a bug or I’m doing something wrong. I’m analysing MiSeq data for the V3 region, sequenced with 2x150 cycle kit. I’ve followed the MiSeq SOP, with the only difference being that I used split.abund with a cutoff=3 before cluster.split. I did this because due to poor quality at the beginning of each cycle I suspected that sequences with few representatives were due to sequencing errors. Previous attempts at going through cluster.split without excluding rare sequences had resulted in error messages, and I thought the problem was a too large number of unique sequences.
Doing the split.abund step brought me down to about 40000 unique sequences, which I put through cluster.split. I ran this in two ways because I was worried the distance matrix would still be too large:
#1

cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cluster=f, cutoff=0.1, processors=10)
cluster.split(file=/disk2/al638/beetleTS/testbug/beetle_abund.pick.file, processors=1)

#2

cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cutoff=0.1, processors=10)

The outcome should be the same, but while #2 works well and produces a unique.list file (but no .rabund or .sabund file), with option #1 I only get the temporary .dist and an.list files and it produces an error after clustering:

[ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct.

This error is repeated thousands of times.

Finally, why am I not getting the .rabund or .sabund file? Could this be an indication of further problems?

Thanks!
Ana

westcott · June 30, 2015, 4:28pm

Thanks for reporting this bug. “[ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct.” It will be fixed in our next release.

The cluster.split command does not create rabund or sabund files when you use a count file. We did this to avoid confusion. When you cluster with a count file, the list file only contains the unique names. The rabund file would contain the abundance of the OTU which would likely be greater than the number of names of sequences in the OTU in the list file. For example:

Count File
seq1 5
seq2 1
seq3 10
seq4 3
seq5 2

List File
label numOtus Otu01 Otu02
0.03 2 seq2,seq3 seq1,seq4,seq5

Rabund File
0.03 2 11 10

Topic		Replies	Views
Error message when doing cluster.split Commands in mothur	6	4999	October 20, 2014
Successful use of cluster.split with Windows? mothur bugs	3	1039	August 31, 2020
Problem cluster.split Commands in mothur	4	329	April 6, 2023
Cluster.split issue "Num_Dists_Below_Cutoff" Commands in mothur	4	1158	March 14, 2019
Cluster.split problem Commands in mothur	1	2822	October 28, 2014

error message when running cluster.split with file option

Related topics