I’m stuck on my 16S rRNA pyrosequencing data. So far I’ve successfully trimmed my data using the qual files and the chimera slayer on Mothur. I aligned my data using SILVA.
Next, I ran my trimmed data through the RDP classifier. Based on the classifier results, I would like to remove some of my sequences from the fasta file (that are un-related to the phylum I’m interested in). Is this where I would use “get.otus”? I’m wondering if I can create a list file (perhaps a tab de-limited file) that includes the names of the sequences I would like to remove and then run a command that would remove these sequences from my fasta file.
Am I on the right track? Any guidance would be HIGHLY appreciated. After this, I intend on clustering my data together by sample and by sequence, perhaps using unifrac. But I am open to any user-friendly options you guys know of.
get.lineage() or remove.lineage() will be your friends, here. They’re both exactly the same to use, just do the opposite of each other. Just make sure you use the right taxonomy name. I’ve been caught out jumping between RDP (Chloroplast) and Greengenes (p__Cyanobacteria) before.
Thank you! I’ve looked into the “get.lineage” and “remove.lineage” functions.
Sorry if this a very basic question (I just started using Mothur). Where would I get the taxonomy file from to run along side the FASTA file from which I wish to remove sequences? Can I include the classifier file that RDP generated for me? Help is much appreciated.
O right, I misunderstood - I thought you’d used in classifier in mothur. If you can get a per-sequence classification out of the RDP website then you will be able to convert that into a format that mothur can read. I don’t use the website classifier at all, but playing around with it now it looks like you can get the details from the “Assignment Detail” view of your data.
If you want to use the website output specifically, you’ll need to download and reformat the output slightly to be consistent with the mothur taxonomy system. This should be a trivial task, you can do it in Notepad or similar with a few “Replace All” commands. The format for taxnomies is here. Alternatively, you could just re-classify your data through mothur using the classify.seqs() command. There’s an RDP9 database available on the mothur wiki and if you use that you’ll be able to move straight into the lineage removal without having to re-format your data.
Hoep that helps!