.dist file with more than 3 columns

Hi all,

I ran into a problem while processing my data, that I think is a bug from the dist.seqs() step.

While running the command cluster(column=mysesqs.trim.unique.good.filter.unique.pick.dist, count=mysesqs.trim.unique.good.filter.uchime.pick.count_table, cutoff=0.2), the error message output was: [ERROR]: SH33SED.10SS.3.1.1019571_167878 is not in your count table. Please correct.

The line in the mysesqs.trim.unique.good.filter.unique.pick.dist file that contained the trouble id contains 4 columns, instead of 3: “SS.3.1.1019571_167878 SH33SED.10SS.3.1.1019571_167878 SH33SED.1020220_67075 0.1919” .
The extra column is formed by 2 ids that somehow got concatenated.
Searching in my dist file for lines wiht more than 2 columns, I found out that 12 lines contain more that 3 columns, normally with a concatenated id in between two normal ids.
It is ok for me to check and correct them manually, but I would like to let you guys know about the problem.
The command runned to generate the dist file was:
dist.seqs(fasta=mysesqs.trim.unique.good.filter.unique.pick.fasta, cutoff=0.2, processors=32)

Maybe it was a writing problem? I was using the full cpu capacity of my computer.
Cheers,

Hmmm… that is odd. With multiple processors, mothur divides the distance calculations between the processors and then appends the resulting files. It looks like an error on the appending. I am not able to reproduce the error on our test machines. Can you try with less processors?

Hi,

Yes, I am trying right now. It is running with 30 processors now. Curiously, now it is not using 100% of capacity of each processor.
I will let you know whether this solves the problem.
Thanks,
Lucas

Hi,

I solved the problem :slight_smile: It was my mistake. The cut-off of the dist.seqs() step was too high, and my file was huge. It saturated the capacity of my hardrive, so the concatenated lines were probably caused by the program having to write in a file without enough space to write on.
I lowered the cut-off down to 0.05 and it was all solved.
Thanks for the help :slight_smile:
Lucas