How to get all raw sequences of particular OTUs?

Hi everybody,

I’d like to extract all raw sequences that belong to a specific group of OTUs for analysis with Oligotyping. Is there a way to go back to the original data perhaps just after the denoising step and pull out only the raw sequences that belong to the particular OTUs of interest?


or even better, would it be possible to classify the raw sequences and then get only the ones that obtained the specific classification of interest?

For your second post, you could just use get.lineage() after classification.

For your first post, someone else might have a cleaner way of doing this, but I would say use get.oturep() to get the representative fasta and names files. Find which sequence in your fasta file corresponds to the OTU of interest (the last page I linked shows you the format of the output fasta file and how the OTU ids are linked), and then use the sequence read you get back to extract the line in your names file that this sequence is found in. If you’re on a linux/Mac computer you could do this step on the console using

grep "your_sequence" rep.names | cut -f2 | sed 's/,/\n/g' > sequences.accnos

Which should give you a text file that lists all the sequences you’re interested in.

You could try get.seqs. Alternatively, if you leave out the pre.cluster step, you’ll probably get the unique OTUs you want