unique.seqs 16S Sanger + Pyro reads

Hi all,

I have full-length (FL) 16S reads and ~400bp 16S pyro reads each from 15 samples. The FL reads are processed through the normal mothur pipeline. What I would like to do for the pyro reads is, first use unique.seqs to deconvolute the dataset and then second, “deconvolute” against my FL dataset. In other words I want to get rid of all pyro reads that match exactly to FL reads. The goal is to identify the diversity “missed” by FL analysis.

So:

  1. is this an appropriate/useful analysis?
  2. can anyone suggest a way of performing such a task?

thanks for the time

jarrod

Yup, you can do it -

  1. Merge your FL and pyrotag sequence datasets
  2. Align the merged datasets to some database
  3. Use screen.seqs to make sure that all of the sequences overlap in the same region
  4. Use filter.seqs to trim the sequences so that the new alignment only considers the overlapping region
  5. Proceed with dist.seqs, cluster, etc.

Right, very simple as I suspected. Thanks Pat

I also have problems similar to those of jarrod_s…
in practice I have 15 samples, each containing > 15k pyro-reads in the 16S V5-V6 region.with an average length of ~ 350 bp.
I was wondering whether I should simply remove sequences perfectly contained in others…I don’t think this will affect in any way my OTUs, but i’m somewhat uncertain about how to go on… consider that I already evaluated this “hard removal” with a perl script, and I found can remove up to 2/3 sequences, much much more than with the unique.seqs approach, resulting in a global file containing “only” 60k sequences instead of the initial > 225k…

May some solve my doubts, please?

matram

So I wouldn’t remove any sequences - just make one big fasta and group file and proceed like we do in the Costello example analysis on the wiki. If 2/3 of the sequences are redundant, then the unique.seqs command will figure that out so that the hard steps of aligning, classifying, distance calculating, and clustering are only done on the uniques and then the redundants will be mapped back in.

Is this what you’re asking about?