error message when running cluster.split with file option

Hi,

Not sure if this is a bug or I’m doing something wrong. I’m analysing MiSeq data for the V3 region, sequenced with 2x150 cycle kit. I’ve followed the MiSeq SOP, with the only difference being that I used split.abund with a cutoff=3 before cluster.split. I did this because due to poor quality at the beginning of each cycle I suspected that sequences with few representatives were due to sequencing errors. Previous attempts at going through cluster.split without excluding rare sequences had resulted in error messages, and I thought the problem was a too large number of unique sequences.
Doing the split.abund step brought me down to about 40000 unique sequences, which I put through cluster.split. I ran this in two ways because I was worried the distance matrix would still be too large:
#1

cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cluster=f, cutoff=0.1, processors=10)
cluster.split(file=/disk2/al638/beetleTS/testbug/beetle_abund.pick.file, processors=1)

#2

cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cutoff=0.1, processors=10)

The outcome should be the same, but while #2 works well and produces a unique.list file (but no .rabund or .sabund file), with option #1 I only get the temporary .dist and an.list files and it produces an error after clustering:

[ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct.

This error is repeated thousands of times.

Finally, why am I not getting the .rabund or .sabund file? Could this be an indication of further problems?

Thanks!
Ana

Thanks for reporting this bug. “[ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct.” It will be fixed in our next release.

The cluster.split command does not create rabund or sabund files when you use a count file. We did this to avoid confusion. When you cluster with a count file, the list file only contains the unique names. The rabund file would contain the abundance of the OTU which would likely be greater than the number of names of sequences in the OTU in the list file. For example:

Count File
seq1 5
seq2 1
seq3 10
seq4 3
seq5 2

List File
label numOtus Otu01 Otu02
0.03 2 seq2,seq3 seq1,seq4,seq5

Rabund File
0.03 2 11 10