For each read (with redundancy), get the OTU corresponding: deunique.seqs()?

vgilbart · October 13, 2021, 9:30am

Hello,

My aim is to get a file with: the name of each read and the corresponding OTU it has been clustered in. I want this even for redundant reads, and I want the original name of the reads.

My idea is to use deunique.seqs(fasta=, name=). So I’ve been trying to keep my name file at each step. But at the pre.cluster(fasta=, name=) step I can’t keep it. I’ve seen this issue : How to generate precluster.names file in v1.42.0? · Issue #616 · mothur/mothur · GitHub with the same problem, but no solution to end up with a correct name file. Is it impossible to get a name file after the pre.cluster step ?
If so, is there another way I can achieve my aim ?

Best,
Valentine

sje062 · October 16, 2021, 5:58am

Hi,
I am not sure I have the answer. Is not the name of sequences clustering into each OTU found in the list file? To see what is going on in the various steps I find this sop helpful: 454 SOP
Sigmund

vgilbart · October 18, 2021, 6:47am

Thank you for your answer! The name of sequences and the corresponding OTU is indeed given in the list file but only for the unique sequences (that are chosen after the unique.seqs() and/or pre.cluster() steps).

So my problem is really to get this same information but for all sequences.

sje062 · October 18, 2021, 7:30am

Hi,
names for unique sequences; look in the name file created from the command unique.seqs. I find mine there. Probably same if you use a count file.
Sigmund

vgilbart · October 18, 2021, 7:38am

Indeed, if I manage to keep the name file all along my analysis I’ll be able to find the information I need! That’s what I was trying to do, but at the pre.cluster() step, I’ve only managed to create a count_table file.
And count_table file tells, for each unique sequence, how many other sequences it represents (but without their name unfortunately)…

westcott · October 19, 2021, 3:30pm

In our latest releases we have been phasing out the name file. The name file requires more memory and time to run commands when compared to the count file.

The pre.cluster command does not output a name file by design. We did this for several reasons, but the primary reason for the change in pre.cluster is the speed and memory constraints caused by the name file.

vgilbart · October 20, 2021, 9:19am

Ok, thank you for your reply! If there’s another workaround to the OTU correspondance for every reads, I’m very much interested!

westcott · October 20, 2021, 5:07pm

There’s not a workaround within mothur, but you could script something to extract the data. The first two columns of the *.map files tell you which reads were combined in each sample (all names are the unique names, ie first column in the names file). You can use this information to create a names file for each sample, and then merge those into a complete names file.

system · October 30, 2021, 5:07pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
All reads in OTU Commands in mothur	2	2210	March 26, 2013
Trouble keeping both updated names file and a count_table Commands in mothur	6	6247	January 19, 2015
Clarity on Cluster .names inclusion Commands in mothur	3	2862	October 17, 2012
Names file vs count_table after pre.cluster() Commands in mothur	2	2814	November 20, 2013
OTUs & unique.seqs Theory behind mothur	2	3662	September 25, 2013

For each read (with redundancy), get the OTU corresponding: deunique.seqs()?

Related topics