I want to create a fasta with all of the sequences from a particular OTU (or set of OTUs). I can’t figure out how the order of otu’s in the list file relate to the named OTU’s in the cons.taxonomy file. please help
thanks, I looked at bin.seqs but I have 275k OTUs and don’t want to generate that many fasta. there doesn’t seem to be an option to select a specific otu
I don’t know of any specific function in mothur to do this, but you could try this workaround. Be warned, it’s pretty ugly…
If you’ve used get.oturep() on your data previously, you might be able to extract them by using the name of the representative sequence as a search string in grep/find on the names file then convert it to an accnos and use it in get.seqs(). Something like:
get.oturep(fasta=yourfasta, name=yournames, list=yourlist, column=yourcolumn, label=0.03)
Then open the resulting yourfasta.an.0.03.rep.fasta file and use the find command to get the sequence name (since the *.rep.fasta contains metadata detailing which OTU the sequence came from). You could do this in any text editor or with grep/find. This name can then be used as a parameter to search
grep "seqname" yournames.an.0.03.rep.names > sequences.accnos
Finaly, replace the commas with line breaks, and remove the duplicate first entry, then use this as the accnos for get.seqs()
get.seqs(fasta=yourfasta, name=yournames, accnos=sequences.accnos)
Mothur generates the OTULabels for the cons.taxonomy file from the list file in the order they are in the file. For example:
list file:
0.03 4 seq1,seq2,seq3 seq4,seq5 seq6,seq7,seq8 seq9,seq10
OTU1 contains seq1,seq2,seq3
OTU2 contains seq4,seq5
OTU3 contains seq6,seq7,seq8
OTU4 contains seq9,seq10
There is not an easy way to do what you looking to do. Both workarounds would work. Mothur does have get.otulabels and remove.otulabels commands, but currently those commands only take .cons.taxonomy, .corr.axes and otu.corr files. We will add the list file to both command in the next release so you could run something like this:
get.otulabel(list=yourLIstFile, accnos=fileContainingYourDesiredLabels) - generates a list file with only the labels you asked for
list.seqs(list=current)
get.seqs(fasta=yourFastaFile, name=yourNameFile, accnos=current)
Kindly,
Sarah
thanks Sarah, we found a work around but it was kind of ugly. looking forward to the list option
Hi Sarah,
I am trying to get all sequences from several OTUs to get them into the oligotyping pipeline. I have tried the same approach you recommended but using a count file instead of a name file. However, when I run all the commands I get a fasta with only 17 sequences (like with get.oturep command) and a count file. Maybe the problem is the use of a count file instead of name file?
Thanks for your help