I would be happy to get some help. I have to analyse bacterial and fungal microbial 454 sequences and for the fungi the ITS region (primer: ITS1F and ITS4) has been sequenced. Now I don´t know to which database I should align my sequences? This question came up before but I couldn´t find an answer that helps me what makes me worried that there doesn´t exist one? Can I not align my sequences and filter them and blast the OTUs later? If yes, which program should I use for that? MEGAN?
Possibly somebody has experience with that and would be so kind to give me advice! That would be soooo great!
I´d suggest using the UNITE database - it has recently been implemented in Qiime. Of course, you can cluster and align your sequences and blast later - that should not be a problem whatever the programme. If you wish, I´m sure you could use MEGAN for that. I myself have predominantly used MOTHUR, and I think it´s a good guess that most people here will be more familiar with the latter, too (in case you need further help/suggestions). Hope this was of any use!
Thank you very much for your answer! I tried to align to the UNITE database and I tried it also without aligning my sequences and I can trim them normally but after the filter.seqs command I get the error message
“Sequences are not all the same length, please correct” in both cases.
I didn´t have any problems with my 16S sequences during this step.
Oh, that is right - you may not actualy be doinf anything wrong, it´s merely the UNITE sequences that are not aligned. That is a problem, indeed - I have heard others complain about that in MOTHUR. The problem is that the diversity is so high in fungal ITS, also with respect to the lentgh variation, that it is practically unalignable across the entire kingdom. I am not sufficiently familiar with MOTHUR to suggest a best way out of this, unfortunately. You could try and mail the people in Tartu (Leho Tedersoo, Mohamed Bahram) and inquire about it - they have a rich experience both in analysis of 454 data and maintenance of the UNITE database. Sorry not to be of more help.
So here’s a theory question for ITS folks… If I single the ITS from a single fungus, how many different ITS sequences will I get? At least in the Bacteria, if there are multiple copies of the 16S rRNA gene (e.g. E coli), you’ll get several different 16S-23S fragment types that vary in length and sequence.
So the basic question is:
Given that Fungal ITS is not appropriate for alignment, can we use MOTHUR without aligning?
There are a lot of different questions floating around the forums that hit this question, but I don’t find any definitive answer.
Note, 454 is history, so most of us are talking Illumina, and thus, potentially, a lot more sequences (millions, not hundreds of thousands).
I would be most appreciative of any answers.
You might try running pre.cluster with unaligned sequences and use align=needleman and diffs=2 (or 3). This will generate pairwise alignments, and should go a bit faster. Also, be sure that you are using a group or count file when you run this. This is a new feature within the latest release of mothur.
I haven’t come up with an ITS script for illumina data yet. 454-I’d use mothur to ssh flows and precluster, then use crunchclust (or uclust) to cluster, do a little text editing to convert to a shared file and bring it back into mothur for the diversity measures.
Pat, your question about number of ITS per organism is important but I haven’t found an answer. Have you?