Hi,
Not sure if this is a bug or I’m doing something wrong. I’m analysing MiSeq data for the V3 region, sequenced with 2x150 cycle kit. I’ve followed the MiSeq SOP, with the only difference being that I used split.abund with a cutoff=3 before cluster.split. I did this because due to poor quality at the beginning of each cycle I suspected that sequences with few representatives were due to sequencing errors. Previous attempts at going through cluster.split without excluding rare sequences had resulted in error messages, and I thought the problem was a too large number of unique sequences.
Doing the split.abund step brought me down to about 40000 unique sequences, which I put through cluster.split. I ran this in two ways because I was worried the distance matrix would still be too large:
#1
cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cluster=f, cutoff=0.1, processors=10)
cluster.split(file=/disk2/al638/beetleTS/testbug/beetle_abund.pick.file, processors=1)
#2
cluster.split(fasta=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.fasta, count=/disk2/al638/beetleTS/new2_trim/beetle_abund.pick.count_table, taxonomy=/disk2/al638/beetleTS/new2_trim/beetle_abund.pds.wang.pick.taxonomy, splitmethod=fasta, taxlevel=4, cutoff=0.1, processors=10)
The outcome should be the same, but while #2 works well and produces a unique.list file (but no .rabund or .sabund file), with option #1 I only get the temporary .dist and an.list files and it produces an error after clustering:
[ERROR]: Your count table contains more than 1 sequence named , sequence names must be unique. Please correct.
This error is repeated thousands of times.
Finally, why am I not getting the .rabund or .sabund file? Could this be an indication of further problems?
Thanks!
Ana