I work with fungal ITS sequences. Have worked with a number 454 datsets, and am now working with Illumina miseq, and thus many many more sequences (50-100 times more per run).
Can we use Mothur to bring an illumina dataset from demultiplexed fastq files to a otuXsample table without aligning?
If so, how? (The recommended illumina SOP assumes we are aligning our sequences, and it seems many of the scripts require aligned sequences)
If we need to align at certain steps for the mothur workflow to work, can we do it without using an aligned ITS database?
You can get a distance matrix for clustering using the pairwise.seqs command.
I don’t know if there’s an official mothur SOP for ITS sequences, but I think that the following steps would give you the gist of the 16S rRNA SOPs, without the need for alignment:
trim.seqs()
summary.seqs()
screen.seqs() #Remove short/bad quality sequences
chimera.uchime()
classify.seqs()
remove.lineage() #Get rid of anything you consider junk.
pairwise.seqs()
cluster()
make.shared()
For classification, there’s a mothur-formatted copy of the UNITE database here.
Thanks a lot for the reply.
It looks like though, for the amount of sequences I have pairwise.seqs is going to take weeks.
I don’t have any idea what the columns in the verbose output correspond to (the one on the right might be total number of seqs? But the one on the left moves around a great deal), so I can’t tell how far it has progressed, but it seems prohibitively slow.
do you?
-Thanks
If you’re worried about run time you could try an alternate clustering approach like UPARSE.
I do wonder about this a bit though. I know people on these forums (including Pat, I think) have expressed concern that because the ITS regions are fundamentally different to the 16S gene (in terms of evolutionary pressure and functional importance) it’s not appropriate to apply established 16S pipelines to them. With that in mind, you could cut the analysis time down significantly by just classifying your amplicons and building your OTU table from phylotypes instead of sequence identity (classify.seqs, phylotype, make.shared).