Phylotype vs OTUs


I am currently analysing pig faecal samples using Mothur, and I was wondering if somebody could summarise the advantages/disadvantages of binning into phylotypes and OTUs? I am trying to make a decision as to what would be the best choice for my project.



1 Like

First, some definitions…

Phylotype - your sequence compared to a database and then binned into a group based on its similarity to the database

OTU - your sequence compared to the other sequences in you dataset and binned into a group based on its similarity to other sequences in the dataset

So the primary disadvantage of phylotypes is that it is database dependent: people call the same thing multiple names, some sequences aren't in the database, some genes have really bad databases (e.g. nifH), and usually you aren't able to classify all the way to the genus or species level. The advantages are that it is very fast, forgiving of sequencing errors, and you get a name directly. Names give people warm fuzzy feelings :)

So the primary disadvantage of OTUs is that it is slow and computationally “hard”. It is sensitive to sequencing error rates and so if you have a high error rate you can easily get a gigantic distance matrix that will never cluster. The advantages are that you don’t have to worry about a database and you can tag names onto OTUs later. Also, you tend to get greater resolution - we frequently have many OTUs that have the same genus name because they represent some “sub-genus” taxonomic level.

We’re biased towards the OTU-based approaches around here.

1 Like

Dear Pat,

Thank you very much for your clear reply, much appreciated.

Yes, OTU-based analysis does appear to be more advantageous. As you say, though, people like to see pretty names!

So, I guess I am getting a lot more resolution with the OTU-based analysis (2300 OTUs) rather than the phylotype-based analysis (180 OTUs).



Hi All,

I just want to jump on this topic, to dig a little deeper.

Patrick, you already convinced me of using OTUs. I’m not very inclined to go towards phylotyping.

But, in your 2013 paper in AEM, you propose a heuristic that is basically doing phylotyping before OTU building?

I understand this would speed up things, so it seems valid enough.

But if you are a huge fan of OTUs, can you live with basing your OTUs on a phylotype input, and thereby drag the disadvantages of phylotyping along your further OTU analysis? I don’t understand how this approach can be even remotely comparable to a ‘real’ OTU approach clustering. This way, you can never come up with a OTU that groups sequences from different phyla that might be very much alike, which I thought was one of the strengths of OTU-clustering?

Or am I reading this paper the wrong way?

I would very much appreciate some insight in this proposed analysis pathway!



Great question. So if you look at the figure in the paper, you’ll see that our F statistic for a 0.03 cutoff doesn’t really differ between the heuristic and non-heuristic approaches. This gives me the confidence that we get a speed up without a loss in clustering quality. To be safe, I do generally first cluster to the class or family level and then do OTUs. If it’s a gnarly dataset that is huge, we’ll go to genus level.