align.seqs

lvmingji · August 23, 2010, 3:11pm

hi, i am wondering if the lengths of sequences will have any impact on the results of alignment or other further commands , such as clustering, rarefaction. Becauese the sequences in the primary files(before aligning ) are of differnent lengths,but similar beginnings(a primer of 16S rRNA). Actually, the shortes is about 800, the longest is aout 1200. If the original sequeces have different lengths, after aligning, the aligned sequences then are caculated for the pairwised distances, and finally clustered, and phylogenetic tree were constructed, does these results are reliable? Should i chop the sequences to the same length before i conduct the align command?
Also, if i want to identify the sequeces of bacteria on the level of genus, what length will be appropriate?
Thank you very much

pschloss · August 23, 2010, 6:04pm

It won’t affect aligning, but it will affect the distances you calculate and the results from classification. Because of this, I emphatically encourage people to use the filter.seqs(trump=.) command to blunt all sequences so they overlap the same region. Otherwise you’re comparing evolutionary apples to oranges.

laalaa99stl · August 23, 2010, 7:18pm

…with the caveat that the screen.seqs command should ALWAYS be run before filter.seqs (with the potentially dangerous trump option) to remove the shortest sequences. Otherwise you run the risk of trimming all of your sequences to the length of the shortest input sequnce. It’s generally desirable to sacrifice a few sequences to ensure best coverage. In fact, I can think of a simple algorithm mothur could use to automatically suggest to the user what good begin and end values might be during screening. It would also be good human interface design if mothur nagged the user if greater than half the real bases were being thrown away during filtering.

Robin

lvmingji · August 24, 2010, 12:16pm

Actually, i have used the filter.seqs,but without the “trump” factor,then is it ok to do the further caculation? I mean, is there any differences with or without the "trump"for the further caculation,such as “cluster”,“rarefaction”,“phylogenetic tree”? or if i filtered the sequenced without the “trump” , the results are reliable?cause i have done the serious caulation on the basis of filtering without "trump’ in the command.

pschloss · August 24, 2010, 3:06pm

I would say no, it’s not ok. What you’re doing is essentially saying that the 16S rRNA gene evolves uniformly across it’s length. We know this is not true (for more evidence see my recent PLoS Comp. Biol. paper). So you need to do the trump=. and then re-do everything that is downstream from there.

lvmingji · August 28, 2010, 3:20pm

Thanks very much. Is using the "screen.seqs "command first to remove some short sequeces by setting the “start"and “end” ,and then the"filter.seqs” command a better stategy to get a better results,when coming to “cluster” ,“taxonomy"or"raerfaction”
Another question, how to draw the results of "libshuff"into a figure,as used when comparing the differences among clone libraries in many references.

pschloss · August 28, 2010, 6:19pm

That’s correct - check out the costello example analysis to see how to do this. As for libshuff, all you really need to present are the p-values - if either is significant, the two libraries are different.

Topic		Replies	Views
Problem with filter.seqs - Length of filtered alignment: 0 Commands in mothur	4	338	June 22, 2023
Sequences are not the same length Commands in mothur	3	4204	July 8, 2011
align.seqs shortens some pyrosequences Commands in mothur	1	2195	April 23, 2013
problems with filter.seqs Commands in mothur	3	2160	March 26, 2015
align.seqs in mothur 1.13 mothur bugs	1	3228	October 12, 2010

align.seqs

Related topics