lane1349.gg.filter - Pat Schloss's transcription of the mask

Sam_Lambrechts · July 16, 2013, 11:37am

Dear mothur and beloved children,

I was wondering why there is a third greengenes compatible lane mask to use with filter.seqs? I notice the Lane mask that is currently provided on the GG website is identical to the second one on “http://www.mothur.org/wiki/Lane_mask” (1287), but I wonder what the third one is.

Is the third one better according to you (to use before phylogenetic reconstruction) or does it depend on what you want to do with it?

For your information, I have added a small amount of 454 reads to the total Greengenes reference alignment using align.seqs and I want to calculate a phylogenetic tree of the resulting merged alignment using FastTree. Therefore I want to mask the variable regions of the alignment that could interfere with correct phylogenetic reconstruction

Thank you for any information you would be able to provide

Sam

pschloss · July 16, 2013, 1:40pm

I think the 3rd one may have been lifted from the greengenes ARB database, but I’m not 100%. Regardless, unless one is doing phylogenetics, I would strongly encourage people to stay away from using these types of masks as they mute the genetic diversity between sequences and make things look more similar than they really are. This is appropriate for a broad-scale phylogeny, but not fine scale OTU-based analyses.

Sam_Lambrechts · July 16, 2013, 2:10pm

Thank you for your answer.

broad-scale phylogeny is what I’m trying to do here. I’m just curious about what the differences are between the three greengenes-compatible lane masks you provide:

lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database
lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database
lane1349.gg.filter - Pat Schloss’s transcription of the mask from the Lane paper

When or why should one use for example Lane1287 instead of Lane1349? I have a hard time choosing one out of these three, because I am a little bit in the dark as to based on what I should choose one of these

Kind Regards,

Sam

pschloss · July 16, 2013, 4:12pm

i believe that the 4 digit number is the number of columns that will come out of the filtering of full length sequences. a better approach might be to use the soft filter option to remove any columns where the most common base in a position occurs in less than 50% of the sequences.

Sam_Lambrechts · July 16, 2013, 5:05pm

Thanks for the advice! Not sure if I understand why a lane mask is not advisable in my case though:

Is the Lane mask not advisable because the alignment also contains short 454 sequences? Because the Greengenes consortium does however use a Lane mask before constructing their large phylogenetic trees…And the only difference between their alignment and mine are the extra 454 sequences I added. So I’m guessing that is why you are suggesting a soft mask?

Or did I misunderstand and is there another reason why not to use the lane mask for broad scale phylogeny here

Kind Regards,

Sam

pschloss · July 16, 2013, 7:29pm

Well the Lane mask was developed in 1991ish to make a phylogeny between the three domains. If you can find the original paper (good luck!) it’s based on an alignment of about 10 sequences. My understanding is that the soft mask is to be preferred because it will do a better job of “fitting” the actual data you have.

Sam_Lambrechts · July 16, 2013, 7:49pm

=> Ok especially this explains a lot, didn’t know that. Thank you for the clarification.

Topic		Replies	Views
filtering or screening or getting sequences with a mask Commands in mothur	4	4446	March 4, 2011
pcr.seqs for greengenes Commands in mothur	1	1572	October 19, 2015
Greengenes2 database	8	385	January 6, 2025
Using a newer Greengenes training set Theory behind mothur	2	4117	September 12, 2013
Greengenes Commands in mothur	1	1685	July 14, 2015

lane1349.gg.filter - Pat Schloss's transcription of the mask

Related topics