Choosing a pipeline

I am trying to identify the best pipeline that would work for the analyses I have to do.
So far, I have been looking at the Esophageal and Marine community analysis examples. I am a bit confused on the reasons for performing two different pipeline analyses. For example, in Marine community analyses: the alignment was performed by using the Green Genes WEBPAGE, and the distance matrix was performed by the use of the Green Genes webpage and not just by the use of the DIST.SEQS command in Mothur. Why these differences from the Esophageal community analysis pipeline?


Thanks for your question, a couple of things…

  1. Those examples have been made by different people and so some prefer(ed) to use an alternative tool for different steps. This is fine and does a great job of showing how flexible mothur is in that you can do as much or as little as you want to with mothur.

  2. They were also made at different points in the development of mothur. So some of them may have been written before we incorporated the aligner or the distance calculator. You’ll also notice that none of the examples have used trim.seqs with barcoded sequence data. None of the examples include phylotyping approaches because they were just incorporated in the last release.

  3. For what it’s worth - the examples done by me, “Pat Schloss” are indicated on the examples page. This is how I do the analysis and teach people to do the analysis in my workshops. Take it or leave it.

  4. Please feel free to contribute your own example analysis to show us how you analyze your data! Many people have found these to be very helpful as they navigate the many options available with in mothur.

Thanks for your reply, that makes a lot of sense!
I actually liked the example of the Esophageal community and I would like to follow that pipeline. However, in that case, a .TREE file was already provided. So maybe, my main difficulty now is: how can I build a tree and generate a Newick file that I can use for downstream analysis in Mothur? Until now, I have been using MEGA 4 for my bioinformatics analyses. With that software, however, it is impossible to perform OTU-based analysis and many other things that are actually possible in Mothur. On the other hand, with MEGA 4 I could build a tree…are you familiar with MEGA 4? A possible pipeline I am thinking: I could use the Esophageal community analysis pipeline and use MEGA 4 to display the tree and to generate the Newick file. I am just wondering about the level of “compatibility” between the two software…I have already tried to load aligned sequences from Mothur on MEGA 4 and I reveived an error message…Maybe I can generate a Newick file with Mothur and then load a Newick file on MEGA4 to display the tree? If you think it would be smartier to use another software directly for the tree, I am open to that possiblity too. You have a much bigger experience than me in these things and I imagine you have a much broader view of the possibilities that are out there. I look forward to hear your thoughts and advice!

Thanks again

Yes, you can save newick-formatted trees in mega. It doesn’t really matter what the ending is. I like to give these text files meaningful names so that I know what they are.

Just to make shore I understand: I can build a tree with Mega 4 and then save the tree as Newick file. I can use this file then to run downstream analyses in Mothur. Correct?
If so, this means that, to perform downstream OTU-based analyses, I will run the alignment in Mothur; but, to build a tree, I will run another alignment in Mega 4 first and then I will use that Newick file to perform downstream Hypothesis testing analyses in Mothur. Should I be afraid of any inconsistencies in this method? I imagine that the alignment performed from the two software will lead to slightly different results (in Mega 4 it is more based on visual aids). What do you think? Does this issue really matter?


So here’s what I’d suggest doing…

  1. Generate the alignment and distance matrix in mothur
  2. Import either of those into mega - the alignment is in fasta format and the distance matrix can be outputted from mothur in a lower-triangle phylip format (assuming you set that option in dist.seqs)
  3. Build your tree in mega
  4. Export the tree as a newick-formatted file
  5. Proceed with mothur using that tree

Hope that helps…

Thanks, I will try on the pathway you indicated me, that is actually the one that makes the most sense. However, MEGA 4 does not allow me to load already aligned sequences, even if the file is in FASTA format. Iget an error message all the time and I have no idea why.
Have you ever been able to actually do what you just described?

I think I figured that out: Mega 4 needed uploaded aligned sequences that had the strings of periods “.” filtered out (by the use of the filter.seqs command and the trump option).
It seems it is working now. I created my Newick file and used it for my hypothesis testing analyses.

Splendid, glad it’s working! Feel free to post your pipeline on the Example Analysis page of the wiki.

Yes I will gladly do it as soon as I can.
Thanks for all your help!