Cleaning second-generation sequencing data

I have been looking around the web testing the current tools for pre-processing of sequencing reads and it just seems like no solution really cuts it – including mothur. I was wondering if there are any plans to make mothur better in this aspect and also wanted to know what other tools you guys are using. If you feel it sharing your opinion on the matter after reading the short review I wrote on the subject, that would be great.

Yeah we’re on it. Your review is very much out of date…

Other nuisances include that it doesn’t support the FASTQ format and only takes combination of FASTA and QUAL files. As usual, nothing is said about the expected standard to be used in the QUAL file

Ummm… there’s the command.

There appears to be no paper associated, only an old poster

Have you run mothur before? The first thing that comes up is the citation to a 2009 manuscript. You can also get references by running the command with the keyword citation in teh parenthesis. Eg. align.seqs(citation).


I wasn’t expecting to get such a quick response from the lead developer. That’s great. Sorry about not spotting the paper. I looked on the main page and a quickly on the wiki, didn’t think of looking into the standard output of the trim command for that information. I updated my post and added the link to the paper now.

Is there any other aspect that you think is wrong or out of date ? I wrote this post only a few days ago trying to vent my frustration and trying to capture the reactions of a new user coming to the field of dna sequence analysis, starting with the pre-treatement of the data.

I can imagine that Mothur provides commands for going from FASTA, QUAL, FASTQ back and forth; For that fairly common task, I have developed my own, tested, programs. But the fact is, for trimming reads, it doesn’t take FASTQ or SFF as an input, thus not using the apparent standard file formats and adding a step in the processing. That’s all. It’s a detail.

If I had to point to the thing that bugged me the most, it was the use of a interactive mode as a principal interface for the software. That was probably a bad design idea, in my opinion.


The thing I would suggest is to read anything that comes up in PubMed for the following search: “Schloss PD[au]”. Then you might have a better idea of what’s going on.

Well if you have a few hundred thousand dollars to throw at us, we’ll make a much nicer interface. It’s our philosophy that scientists should concentrate on science, not design.