Hate to be greedy when there are already so many fantastic features, but… A pairwise aligner/clusterer would be great for binning OTUs from diverse lineages or from regions which are not amenable to multiple sequence alignments (such as Fungal ITS). Implementing a program like CD-HIT-EST, UCLUST, BLASTCLUST or, some day PyroNoise, inside MOTHUR would be slick!
PyroNoise is close to being in mothur. As for your idea for pairwise alignments and distances, we have the code laying around adn could easily pop it in for the next release. Probably wouldn’t be as fast a those you mentioned, but perhaps we could come up with wrappers for the other programs. Thanks for the suggestion.
Very close (as in, it’s already there), but we’re still testing it and running it through it’s paces. If you contact me directly I can help you out… Not to be a wet blanket, but the trim.seqs options we suggest in the Costello example still provide much better output than pyronoise/amplicon noise. Regardless, our version is much faster than Quince’s.
Its nice to hear. I hoped I could stick to mothur and the costello example, because, as a novice in Linux and command based programs, I have used much time getting to know mothur. However, I would very much like to test PyroNoise in mothur. I’ll send your an e-mail.
I wondered how far along you were with adding ampliconnosie into mothur? As I’d like to try it out to compare the results of what we’re currently doing.
Hello,
I just used the pairwise.seqs command to align 1500 sequences. I used the countends=F command because I have some partial sequences that I didn’t want to discard. However, I noticed that for a couple sequences, their overlap was very short <50 in a very conserved region. The distance value I got for these two sequences was 0.00, but really they aren’t closely related.
Would it be possible to implement a minimum overlap parameter for pairwise.seqs? For example, you could set it to make sure the two sequences are aligned across a minimum of 150 bases. I think that would eliminate this problem.
Jana, mothur seems to be doing what it’s supposed to do, unless I’m missing something. I’d rather not implement the minimum overlap “feature” because I feel pretty strongly that if sequences don’t overlap their full length, then they shouldn’t be compared this way. For example, what distance would the sequences get if they don’t have a minimum level of overlap?
Hi Pat,
Perhaps I wasn’t clear about the issue we’re seeing. We have a 1100 bp amplicon that includes ITS1, 5.8S, ITS2 and 500 bp of the LSU and we sequence it bi-directionally. Sometimes only the forward or only the reverse read works, so the sequence is only ~500 bp instead of the whole 1100 bp. The issue I was seeing is that when we have a partial forward read and a partial reverse read that only overlap for 25 bp on their ends (which happen to be in a very conserved region of 5.8S) Mothur puts them in the same 0.00 OTU.
A cartoon example looks like this:
That’s why I was thinking that a minimum overlap criterion would be useful to tell Mothur not to align these two sequences.
Alternatively, we can remove them from our dataset before doing the pairwise alignment.