Analysing fungal ITS with the pre.cluster function

jcaguayo · September 10, 2015, 3:30pm

Hello everybody,
I know this has been asked several times on the forum, but I think that it is worth to post this considering the new version of Mothur. I’ve learnt that with the new version the pre.cluster function can be used without aligning so it can be quite helpful to use when analyzing fungal ITS.
So I’ve followed the different recommend steps until pre.cluster (with align=needleman, diffs=2).Then I did some cleaning using the chimera.uchime function. Then I performed classify.seqs, using UNITE db. Then I did some cleaning in order to eliminate all that could’nt be identified as fungi (using the parameter=unknown of the remove.lineage function.
Everything went ok until here, but I am not sure of how to continue to assign OTUS to my samples and to continue with diversity analyses? Whenever I try to use dist.seqs with the fasta file generated after the last step (remove.lineage) I obtain the following error: your sequences are not the same length, aborting.
The last information is that i’ve used a count file all along the analyses.
I would be very grateful if somebody could give me a hand.
Thanks
Jaime

Herna · September 11, 2015, 1:36pm

Hi Jaime

I’m not sure of how to do OTU clustering on Mothur, but you can also use the QIIME pipeline for that. Just search for QIIME Fungal tutorial.

BMott · September 11, 2015, 6:30pm

Hi Jaime,

You can’t use dist.seqs() because it requires an aligned .fasta file (see here: http://www.mothur.org/wiki/Dist.seqs ).

However, Mothur can calculate pairwise distances without a global alignment using the pairwise.seqs() command.
http://www.mothur.org/wiki/Pairwise.seqs

jcaguayo · September 12, 2015, 9:23am

Hello,
Thanks a lot for the anwers.
I’ve have been using Mothur from almost a year now, with good results. That it is why I would like to stay using it. Anyway, if I do not have the choice I will go to see what Qiime proposes.
So, if we want to cluster ITS we still need to perform pairwise.seqs. That it is what I’ve have done systematically with my data. The thing is that before I was working with small sequences files. My new results are bigger and it takes quite long. I thought that this new feature of the pre.cluster function could allow us to do clustering in another way (but it works quite well for ITS for taxonomix assignation and as a first clustering tool!!!).Well, pairwise.seqs is already working I hope it will end by the end of next week.
Thanks a lot again!
Greetings,
Jaime

pschloss · September 18, 2015, 4:36pm

You might also try using pre.cluster with unaligned sequences. This option is now available in the latest version of mothur

Pat

Shaunson26 · May 11, 2016, 2:39am

How do you define what the diffs should be with precluster and ITS sequences. They are different lengths (between 150 and 350 in my dataset)?

pschloss · May 12, 2016, 8:12pm

Dunno

Shaunson26 · May 16, 2016, 12:30am

fail! Pre.clustering for me didn’t really get off the ground? Seemed to hang really early in the first sample… oh well.

I extracted the ITS2 region using ITSx.

Would it help to keep some conserved regions at the start and end for better alignment? This is an option in ITSx (says it may help for a multiple seq alignment… it is pairwise in pre.cluster?)

pschloss · May 16, 2016, 9:52am

You could try pairwise.seqs, but I suspect it will take forever to run.

emmagagen · May 19, 2016, 4:28am

Hi all,
I did the same with my fungal ITS sequences and I ran pairwise.seqs with countends=F

But then I went to cluster.split and I only get the ‘unique’ label…
The logfile says cutoff was changed to 0…
From the FAQ, I imagine this is something to do with the reads being of different lengths…
But I thought countends=F would help reduce the severity of this issue?

Someone else has suggested external clustering programs (e.g. UClust) but the SOP there involves writing a script to pad sequences with Ns to make the sequences them all the same length and I don’t like that idea…

And phylotype approach isn’t appropriate for my dataset as I have a large chunk of sequences that couldn’t be classified with high resolution so cluster together based on classification, even though they are ~ 52% identity to each other…

julcham1 · July 19, 2016, 6:10pm

Hi,

I also have problem to analyse my ITS data. I finally succeded to classify it my UNITE database. However, now I’m not able to to cluster.split :

cluster.split(fasta=Echantillons_its12.trim.contigs.good.unique.precluster.pick.pick.fasta, count=Echantillons_its12.trim.contigs.good.unique.precluster.denovo.uchime.pick.pick.count_table, taxonomy=Echantillons_its12.trim.contigs.good.unique.precluster.pick.UNITEv6_sh_99_s.wang.pick.taxonomy, splitmethod=classify, taxlevel=4, cutoff=0.15, processors=8)

Using 8 processors.
Using splitmethod fasta.
Splitting the file…
/******************************************/
Running command: dist.seqs(fasta=Echantillons_its12.trim.contigs.good.unique.precluster.pick.pick.fasta.0.temp, processors=8, cutoff=0.155)

Using 8 processors.
/******************************************/
[ERROR]: your sequences are not the same length, aborting.

What would be the result if I do pairewise.seqs prior to cluster ?

Thanks,
Julien

Topic		Replies	Views
aligning fungal ITS sequences for pre.cluster? Commands in mothur	1	1642	August 28, 2018
Processing amplicon datasets without aligning Theory behind mothur	3	3082	August 11, 2015
Feedback on a pre.cluster issue workaround for processing ITS sequences Commands in mothur	2	593	November 1, 2019
analysing fungal 454 sequences Commands in mothur	8	6254	August 13, 2015
Fungal ITS Support Feature requests	1	6139	August 9, 2012

Analysing fungal ITS with the pre.cluster function

Related topics