filtering or screening or getting sequences with a mask

Leptothrix · February 28, 2011, 9:59pm

Hi Pat and mothur aficionados,

I have been trying to figure this out for a little while but I keep hitting a wall. Is there an easy way to either screen, filter or get aligned sequences that fit to a user defined mask or a user-defined consensus sequence. I can think of several ways to make the mask or consensus sequences (e.g. generate consensus sequence in mothur or generate a mask in ARB) but I can’t figure out how to pull the sequences out that fit that this mask. I want to screen a pyrosequencing library with 69,000 sequences.

Any ideas how to figure this out would be greatly appreciated,
Emily

pschloss · March 2, 2011, 11:02am

In filter.seqs there is an option to use a “hard” mask, which is a user supplied mask consisting of 0’s and 1’s to indicate which columns to chuck and which to keep. We provide versions of the Lane mask for the greengenes and SILVA alignments, but you could “easily” make your own. Is this what you mean?

Leptothrix · March 2, 2011, 6:34pm

Hi Pat,
ahh, okay, thank you, I finally understand what a mask is. I was thinking of it a little differently, maybe more like a consensus sequence.

My idea was

I have a FISH probe that hits a particular population of cells
I also have a clone library from that environment,
From the clone library I select the sequences that match my probe
I then trim the sequences to just include a variable region
obtain a consensus sequence from the variable region of the trimmed clones
get the sequences in my pyrosequencing library that match that consensus sequence
follow the rest of the Costello stool sample pipeline to look at how those sequences differ through out my pyrosequencing dataset (various sample sites, sampling dates etc).

1-5 seems pretty straight forward but I am stuck on number 6. I think there must be a way to do it in mothur but I just can’t figure out the best command.

Thank you,
Emily

pschloss · March 3, 2011, 12:45pm

Emily -
Hmmm. For #6 - could you align your probe sequence? If so, you could use classify.seqs(method=knn, numwanted=1, search=distance) and give it your probe and sequences. You would then get an output which would be the distance between your sequences and your probe. Those that are dead on or within a given threshold would be what you’d want (I think). This is something we’ve been thinking about for removing “contaminants”, but at this point don’t have a better approach yet. Let me know what you think.

Pat

Leptothrix · March 4, 2011, 4:38pm

Hi Pat,

Thanks, Hmm. The probe sequence isn’t in the variable region maybe the mask is the route to go then. So I take the clones, align them, import them into an alignment editor (in this case ARB), trim down the region to the variable region targeted in my pryosequencing library, generate a consensus sequence for this region, export it.

Then I add the trimmed clone sequences to the pyrosequencing dataset (as taxa controls) make the mask and mask over all of variable base pairs, group the sequences in the pyrosequencing dataset, obtain the names for each grouping, find my known clones and then get.seqs using the accnos file.

Okay maybe this is what I will try. It will probably take some more tweaking but I can let you know how it goes.

Thanks,
Emily

Topic		Replies	Views
Problems with screen.seqs and filter.seqs commands Commands in mothur	7	7202	September 20, 2012
align.seqs in mothur 1.13 mothur bugs	1	3227	October 12, 2010
how to process huge pyrosequencing data using Mothur? mothur bugs	5	6550	November 5, 2010
Filtered alignment, 0 mothur bugs	10	803	April 30, 2020
Unique nseq & a lot of "Bacteria; unlcassified" Commands in mothur	1	2416	March 30, 2015

filtering or screening or getting sequences with a mask

Related topics