unique.seqs 16S Sanger + Pyro reads

jarrod_s · January 11, 2010, 2:48pm

Hi all,

I have full-length (FL) 16S reads and ~400bp 16S pyro reads each from 15 samples. The FL reads are processed through the normal mothur pipeline. What I would like to do for the pyro reads is, first use unique.seqs to deconvolute the dataset and then second, “deconvolute” against my FL dataset. In other words I want to get rid of all pyro reads that match exactly to FL reads. The goal is to identify the diversity “missed” by FL analysis.

So:

is this an appropriate/useful analysis?
can anyone suggest a way of performing such a task?

thanks for the time

jarrod

pschloss · January 11, 2010, 4:27pm

Yup, you can do it -

Merge your FL and pyrotag sequence datasets
Align the merged datasets to some database
Use screen.seqs to make sure that all of the sequences overlap in the same region
Use filter.seqs to trim the sequences so that the new alignment only considers the overlapping region
Proceed with dist.seqs, cluster, etc.

jarrod_s · January 11, 2010, 7:45pm

Right, very simple as I suspected. Thanks Pat

matram · February 5, 2010, 7:23am

I also have problems similar to those of jarrod_s…
in practice I have 15 samples, each containing > 15k pyro-reads in the 16S V5-V6 region.with an average length of ~ 350 bp.
I was wondering whether I should simply remove sequences perfectly contained in others…I don’t think this will affect in any way my OTUs, but i’m somewhat uncertain about how to go on… consider that I already evaluated this “hard removal” with a perl script, and I found can remove up to 2/3 sequences, much much more than with the unique.seqs approach, resulting in a global file containing “only” 60k sequences instead of the initial > 225k…

May some solve my doubts, please?

matram

pschloss · February 5, 2010, 11:48am

So I wouldn’t remove any sequences - just make one big fasta and group file and proceed like we do in the Costello example analysis on the wiki. If 2/3 of the sequences are redundant, then the unique.seqs command will figure that out so that the hard steps of aligning, classifying, distance calculating, and clustering are only done on the uniques and then the redundants will be mapped back in.

Is this what you’re asking about?

Topic		Replies	Views
unique.seqs command Commands in mothur	4	34403	February 11, 2013
how to process huge pyrosequencing data using Mothur? mothur bugs	5	6550	November 5, 2010
unique.seqs for identical but varying length reads Commands in mothur	2	2261	July 31, 2013
using mothur with sanger seqs? Commands in mothur	2	2265	October 5, 2012
16S extracted from metagenomic data: files required ? Theory behind mothur	4	7091	August 8, 2012

unique.seqs 16S Sanger + Pyro reads

Related topics