Analysing fungal ITS with the pre.cluster function

Hello everybody,
I know this has been asked several times on the forum, but I think that it is worth to post this considering the new version of Mothur. I’ve learnt that with the new version the pre.cluster function can be used without aligning so it can be quite helpful to use when analyzing fungal ITS.
So I’ve followed the different recommend steps until pre.cluster (with align=needleman, diffs=2).Then I did some cleaning using the chimera.uchime function. Then I performed classify.seqs, using UNITE db. Then I did some cleaning in order to eliminate all that could’nt be identified as fungi (using the parameter=unknown of the remove.lineage function.
Everything went ok until here, but I am not sure of how to continue to assign OTUS to my samples and to continue with diversity analyses? Whenever I try to use dist.seqs with the fasta file generated after the last step (remove.lineage) I obtain the following error: your sequences are not the same length, aborting.
The last information is that i’ve used a count file all along the analyses.
I would be very grateful if somebody could give me a hand.
Thanks
Jaime

Hi Jaime

I’m not sure of how to do OTU clustering on Mothur, but you can also use the QIIME pipeline for that. Just search for QIIME Fungal tutorial.

Hi Jaime,

You can’t use dist.seqs() because it requires an aligned .fasta file (see here: http://www.mothur.org/wiki/Dist.seqs ).

However, Mothur can calculate pairwise distances without a global alignment using the pairwise.seqs() command.
http://www.mothur.org/wiki/Pairwise.seqs

Hello,
Thanks a lot for the anwers.
I’ve have been using Mothur from almost a year now, with good results. That it is why I would like to stay using it. Anyway, if I do not have the choice I will go to see what Qiime proposes.
So, if we want to cluster ITS we still need to perform pairwise.seqs. That it is what I’ve have done systematically with my data. The thing is that before I was working with small sequences files. My new results are bigger and it takes quite long. I thought that this new feature of the pre.cluster function could allow us to do clustering in another way (but it works quite well for ITS for taxonomix assignation and as a first clustering tool!!!).Well, pairwise.seqs is already working I hope it will end by the end of next week.
Thanks a lot again!
Greetings,
Jaime

You might also try using pre.cluster with unaligned sequences. This option is now available in the latest version of mothur

Pat

How do you define what the diffs should be with precluster and ITS sequences. They are different lengths (between 150 and 350 in my dataset)?

Dunno :slight_smile:

fail! Pre.clustering for me didn’t really get off the ground? Seemed to hang really early in the first sample… oh well.

I extracted the ITS2 region using ITSx.

Would it help to keep some conserved regions at the start and end for better alignment? This is an option in ITSx (says it may help for a multiple seq alignment… it is pairwise in pre.cluster?)

You could try pairwise.seqs, but I suspect it will take forever to run.

Hi all,
I did the same with my fungal ITS sequences and I ran pairwise.seqs with countends=F

But then I went to cluster.split and I only get the ‘unique’ label…
The logfile says cutoff was changed to 0…
From the FAQ, I imagine this is something to do with the reads being of different lengths…
But I thought countends=F would help reduce the severity of this issue?

Someone else has suggested external clustering programs (e.g. UClust) but the SOP there involves writing a script to pad sequences with Ns to make the sequences them all the same length and I don’t like that idea…

And phylotype approach isn’t appropriate for my dataset as I have a large chunk of sequences that couldn’t be classified with high resolution so cluster together based on classification, even though they are ~ 52% identity to each other…

Hi,

I also have problem to analyse my ITS data. I finally succeded to classify it my UNITE database. However, now I’m not able to to cluster.split :

cluster.split(fasta=Echantillons_its12.trim.contigs.good.unique.precluster.pick.pick.fasta, count=Echantillons_its12.trim.contigs.good.unique.precluster.denovo.uchime.pick.pick.count_table, taxonomy=Echantillons_its12.trim.contigs.good.unique.precluster.pick.UNITEv6_sh_99_s.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=8)

Using 8 processors.
Using splitmethod fasta.
Splitting the file…
/******************************************/
Running command: dist.seqs(fasta=Echantillons_its12.trim.contigs.good.unique.precluster.pick.pick.fasta.0.temp, processors=8, cutoff=0.155)

Using 8 processors.
/******************************************/
[ERROR]: your sequences are not the same length, aborting.

What would be the result if I do pairewise.seqs prior to cluster ?

Thanks,
Julien