Hi all,
I ran into a problem while processing my data, that I think is a bug from the dist.seqs() step.
While running the command cluster(column=mysesqs.trim.unique.good.filter.unique.pick.dist, count=mysesqs.trim.unique.good.filter.uchime.pick.count_table, cutoff=0.2), the error message output was: [ERROR]: SH33SED.10SS.3.1.1019571_167878 is not in your count table. Please correct.
The line in the mysesqs.trim.unique.good.filter.unique.pick.dist file that contained the trouble id contains 4 columns, instead of 3: “SS.3.1.1019571_167878 SH33SED.10SS.3.1.1019571_167878 SH33SED.1020220_67075 0.1919” .
The extra column is formed by 2 ids that somehow got concatenated.
Searching in my dist file for lines wiht more than 2 columns, I found out that 12 lines contain more that 3 columns, normally with a concatenated id in between two normal ids.
It is ok for me to check and correct them manually, but I would like to let you guys know about the problem.
The command runned to generate the dist file was:
dist.seqs(fasta=mysesqs.trim.unique.good.filter.unique.pick.fasta, cutoff=0.2, processors=32)
Maybe it was a writing problem? I was using the full cpu capacity of my computer.
Cheers,