Towards an SOP for fungal ITS sequences

Would anyone like to work together on developing an SOP for analyzing fungal ITS sequences? There seems to be a lot of confusion on how to adapt the MiSeq 16S protocol, particularly when it comes to alignment (you can’t / you shouldn’t) and clustering methods (de novo vs reference-based). Ideally, we would have an ITS protocol that reflects the unique challenges of working with fungal communities, and that can be easily adapted for different workflows or research objectives.

As another outcome, we might identify new features to request (no pressure, Pat and Sarah) to make it easier and more efficient to analyze ITS data. For example, I would like to figure out a way to calculate distance matrices without melting my laptop. I think this must be possible.

3 Likes

We would love to work with you on this. Our biggest hang up at the moment is a lack of data. Do you have fastq files that you’d be comfortable making public for people to use in an SOP?

Also, since it’s a bit of the wild west out there with ITS sequences, if we pull something together I’d be in favor of publishing it and making anyone that helps a co-author.

Thanks,
Pat

1 Like

That sounds great. I have a couple MiSeq datasets that we could play around with. One is from my MSc project comparing highly diverse soil fungal communities from different islands on Haida Gwaii. I previously used PIPITS to analyze it. Another is from a pilot study looking at fungal endophytes in sword fern rhizomes (Polystichum munitum) in the hopes of identifying a pathogen. We used QIIME and mothur for that one. I’ll email you directly

I am also starting to analyze some fungal metabarcoding and I would love to know opinions… by now I mostly went ahead and did everything without alignment, but would be great to hear many other voices.

Hey there! I’m curious about the approach that you took and which steps were most challenging. What is your overall research objective?

1 Like

I am “just” doing community ecology, fungal plankton, using metabarcoding of the ITS. I did a very simple approach, but partly because the region was too long and non-overlapping. this was just a prelim data. So I used the F (ITS1) and R (ITS2) independently. I cut the fragments when Q<30, and then loaded the F and an in silico R (jus RC with a python scrip; I know Pat will LOVE this step, ha!). The challenges are the huge variability in length and the inaligneability (if such word exists). I did a very basic first pass

make.file(inputdir=./)
make.contigs(processors=20, file=stability.files)
summary.seqs(fasta=current)
screen.seqs(fasta=current, group=current, maxambig=0, maxlength=300)
summary.seqs(fasta=current)
unique.seqs(fasta=current, group=current)
summary.seqs(fasta=current, name=current)
chimera.vsearch(fasta=current, name=current, group=current, dereplicate=t)
remove.seqs(fasta=current, accnos=current, name=current)
count.seqs(name=current, group=current)
pre.cluster(fasta=current, diffs=1, count=current, method=unoise)
summary.seqs(fasta=current, count=current, processors=20)
screen.seqs(fasta=current, count=current, minlength=200)
summary.seqs(fasta=current, count=current)
count.seqs(count=current, compress=f)
classify.seqs(fasta=current, reference=/home/FCAM/leocadio/BIOSSCOPE/fungal/Unite_ITS_s_02/UNITEv6_sh_99_s.fasta, taxonomy=/home/FCAM/leocadio/BIOSSCOPE/fungal/Unite_ITS_s_02/UNITEv6_sh_99_s.tax, cutoff=60)
summary.tax(taxonomy=current, count=current)
quit()

Problems I see here: Since I have not aligned them, then they might be reaching very different regions of the ITS (due to indel sites). But, I am not sure aligning is great. I am trying to move to shorter regions for the primer, but… not ideal either. And all other platforms (eg sequel) the error is too high even with consensus (all pipelines throw a 99% similarity step to remove the noise). So, i am interested in what would we think could be the best way.

Thanks for sharing. I’ll take a look at how it compares with my workflow. In the meantime, here’s a link to Kendra’s batch file for processing ITS2 sequences.

Just to clarify, you used a forward primer to capture the ITS1 region and a reverse primer for the ITS2 region? Assuming we’re talking about the MiSeq v3 platform (2 x 300 bp), those forward and reverse sequences would rarely overlap with each other, especially if you truncate them when the quality scores drop below 30. So it sounds like you’re using forward reads only (?)

I was just testing those primers, since they are the most commonly used for fungal plankton. i did analyzed them separate, yes. To be honest, I am still trying to decide which primers to use now on!

I had checked Kendra’s pipeline, and mine is actually very similar - somewhat more simple?

I truncated at Q30 since they rarely overlap, and unless you do a very low threshold for minimum overlapping, you loose many there. And, the resulting dataset is VERY noisy. I was also in the OSM in the fungal session, and people are moving to even LONGER datasets, using then pacBIO for sequencing - that still is very noisy unless you do a 99% clustering - that then drops the resolution when trying to assign species.

Hmm well I’m not sure about fungal plankton, but in the soil world, it’s common to use primers targeting either the ITS1 region OR the ITS2 region, for fungal communities dominated by Ascomycota and Basidiomycota. Really depends on the research objectives, the constraints of the sequencing technology and our understanding of the ITS region for the taxa of interest. Here are some references on fungal primers that I found helpful:

In fungal plankton those are the most used, at least until recently. Some modifications in the R to improve basidiomycota, sometimes. But, those are the ones most of them used.
Now most people are moving to PB (good luck there) or minION (let there be dragons), and targeting regions in the order or several Kbases.