I’m doing analysis of fungal ITSregion2 and classifying my sequences to the UNITE database.
As ITS sequences can’t really be aligned, I have been going with a phylotype approach.
However, that leads me to a problem with sequences that aren’t able to be classified with much resolution.
e.g. I have two major ‘unique’ sequences in different samples, that classify as:
In a phylotype approach, these are a single phylotype.
However, the two representative sequences with this classification are very different, they have very different nearest named isolates in BLAST, and a quick ClustalW and then distance calculation between the two representative sequences gives a distance of only 52.1%…
So, I feel that it would be misleading to put these sequences into a single phylotype and show this phylotype as a ‘shared’ major phylotype between my two samples…
I realise that this is a problem with phylotype approach in general, but I am not sure of a way around this for fungal ITS.
Does anyone have a suggestion? Thanks.
I wouldn’t do phylotype for ITS for exactly that reason. In the past I’ve done the processing up through chimera checking in mothur, then cluster using an external greedy pairwise (I’ve used crunchclust, there’s also cd-hit, uclust, etc), once they are clustered convert the output to mothur’s list format and proceed with the diversity stats. I’m working on a wiki for this but no promises for when it’ll be done.
Thanks for the reply. So, do you have a threshold for clustering that you would recommend for ITS2? Or can you refer me to a publication?
So far I’ve found 85% for genus level http://microbiomejournal.biomedcentral.com/articles/10.1186/2049-2618-1-6
I tried 3, 5, and 10% (or 8?? it’s been a few years). I went with 5% based entirely on a few papers (probably from Nilsson’s group) and my downstream analyses. 10% had seqs that were ID to very different fungal groups falling into the same cluster and I saw no patterns in my beta diversity at 3%. I think something like CROP that doesn’t use a strict cutoff but rather clusters based on the density of your data would be interesting but I was never able to get it to work.
I’d be very suspicious of any linkage between a distance threshold and a taxonomic level. If you want a taxonomic level, then I’d suggest the classification route. I think the UNITE database is on the wiki and has been formatted for classification.
Yes thanks Pat, I am using UNITE, but the classification approach isn’t great for sequences that can’t be classified with much resolution.
e.g. I have 2 major ‘unique’ sequences that both classify by UNITE as Fungi, Ascomycota, unclassified… And then I have 6 other major unique sequences that are all classified to species level as Circinaria contorta…
So, if I only look at ‘unique’ sequences, it seems like there are 6 different C. contorta sequences, when these probably originate from 1 (maybe 2) organisms.
But if I go with phylotype approach then the 2 unclassified Ascomycota which are very distant to each other (~52%) seem to be one group, which also isn’t true…
You are correct, there is definitely no appropriate % distance for ITS sequences at any level… But I was hoping for some kind of middle ground between the ‘unique’ and ‘phylotype’ approach which both have limitations for my dataset. Thanks for your help