Processing 454, Illumina, and Sanger Data


I’m currently part of a computer science bioinformatics class at a state university. In this class we’ve been split into different groups, with different goals. Our group was tasked with comparing two different microbial community software packages. One of these software packages is mothur. Furthermore what we’re to do with mothur is that we’re to take sequence read data that was produced by a 454, Sanger, and Illumina machine, and process it through mothur via the steps that will allow for OTU picking, alpha diversity, and finally beta diversity.

This data has already been demultiplexed.

Our group has tried several times to follow the workflows provided by the analysis examples, but have had no luck for any of these data sets.

From our understanding we believe that in order to perform OTU picking our workflow is:


Then to perform the alpha diversity analysis we would just use:


And then finally we would perform our beta diversity analysis using:


Is this the correct workflow that we should be doing? If not could someone elaborate on what the correct workflow should be as well as provide some brief reasoning.

We believe one of the reasons we’ve been having issues with our data is that the sequences haven’t been cleaned up to account for different read lengths. When investigated we found that there were some processes like screen.seqs, and chop.seqs. However, with our preliminary testing we’ve had no luck in getting different results after applying these.

Is there a specific nuance to these steps that we may be forgetting to account for?

Lastly we’ve seen in a few of the steps that there can be a name file provided as well. Is there a process to extract a name file from some step of the output, because as it stands we haven’t created an artifact that represents the name file.

We appreciate any input that anyone has for our questions.

You may want to take a look at