For each read (with redundancy), get the OTU corresponding: deunique.seqs()?

Hello,

My aim is to get a file with: the name of each read and the corresponding OTU it has been clustered in. I want this even for redundant reads, and I want the original name of the reads.

My idea is to use deunique.seqs(fasta=, name=). So I’ve been trying to keep my name file at each step. But at the pre.cluster(fasta=, name=) step I can’t keep it. I’ve seen this issue : How to generate precluster.names file in v1.42.0? · Issue #616 · mothur/mothur · GitHub with the same problem, but no solution to end up with a correct name file. Is it impossible to get a name file after the pre.cluster step ?
If so, is there another way I can achieve my aim ?

Best,
Valentine

Hi,
I am not sure I have the answer. Is not the name of sequences clustering into each OTU found in the list file? To see what is going on in the various steps I find this sop helpful: 454 SOP
Sigmund

Thank you for your answer! The name of sequences and the corresponding OTU is indeed given in the list file but only for the unique sequences (that are chosen after the unique.seqs() and/or pre.cluster() steps).

So my problem is really to get this same information but for all sequences.

Hi,
names for unique sequences; look in the name file created from the command unique.seqs. I find mine there. Probably same if you use a count file.
Sigmund

Indeed, if I manage to keep the name file all along my analysis I’ll be able to find the information I need! That’s what I was trying to do, but at the pre.cluster() step, I’ve only managed to create a count_table file.
And count_table file tells, for each unique sequence, how many other sequences it represents (but without their name unfortunately)…

In our latest releases we have been phasing out the name file. The name file requires more memory and time to run commands when compared to the count file.

The pre.cluster command does not output a name file by design. We did this for several reasons, but the primary reason for the change in pre.cluster is the speed and memory constraints caused by the name file.

Ok, thank you for your reply! If there’s another workaround to the OTU correspondance for every reads, I’m very much interested!

There’s not a workaround within mothur, but you could script something to extract the data. The first two columns of the *.map files tell you which reads were combined in each sample (all names are the unique names, ie first column in the names file). You can use this information to create a names file for each sample, and then merge those into a complete names file.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.