bin.seqs subset

I have one group of fungi that I’m interested in exploring in detail. So I’ve pulled out all the unique sequences that are classified to that taxa. Now I want to know which OTU each of those fasta was binned to at 0.05. It seems that bin.seqs should work for this but I’m comparing to the whole dataset list file so bin.seqs fails when a sequence is found in the list file that isn’t in the fasta. Is there a way to force bin.seqs to just skip all sequences that aren’t found in the fasta?

ok figured out a pretty easy work around.

bin.seqs with whole list, whole fasta which creates bin.seqs.fasta

grep ‘^>’ bin.seqs.fasta > bin.seqs.fasta.identifier
grep -f selected.seqs bin.seqs.fasta.identifier > selected.seqs.otus

The get.lineage command, http://www.mothur.org/wiki/Get.lineage, may be helpful for you. You can run something like:

get.lineage(fasta=yourFasta, list=yourList, name=yourName, taxon=TaxaYouAreInterestedIn, taxonomy=yourTaxnonmyFile) - select sequences classified to taxon from your files
bin.seqs(fasta=current, list=current, name=current) - list OTU info on the selected sequences.

ah, thanks Sarah. I didn’t know get.lineage could take a list file too. that’s how I generated the fasta, names, group