Processing amplicon datasets without aligning

nrosenstock · August 8, 2015, 7:53am

I work with fungal ITS sequences. Have worked with a number 454 datsets, and am now working with Illumina miseq, and thus many many more sequences (50-100 times more per run).
Can we use Mothur to bring an illumina dataset from demultiplexed fastq files to a otuXsample table without aligning?
If so, how? (The recommended illumina SOP assumes we are aligning our sequences, and it seems many of the scripts require aligned sequences)

If we need to align at certain steps for the mothur workflow to work, can we do it without using an aligned ITS database?

Thanks very much in advance for any assistance.

dwaite · August 9, 2015, 8:51am

You can get a distance matrix for clustering using the pairwise.seqs command.

I don’t know if there’s an official mothur SOP for ITS sequences, but I think that the following steps would give you the gist of the 16S rRNA SOPs, without the need for alignment:

trim.seqs()
summary.seqs()
screen.seqs() #Remove short/bad quality sequences
chimera.uchime()
classify.seqs()
remove.lineage() #Get rid of anything you consider junk.
pairwise.seqs()
cluster()
make.shared()

For classification, there’s a mothur-formatted copy of the UNITE database here.

nrosenstock · August 11, 2015, 7:53am

Thanks a lot for the reply.
It looks like though, for the amount of sequences I have pairwise.seqs is going to take weeks.
I don’t have any idea what the columns in the verbose output correspond to (the one on the right might be total number of seqs? But the one on the left moves around a great deal), so I can’t tell how far it has progressed, but it seems prohibitively slow.
do you?
-Thanks

dwaite · August 11, 2015, 8:24pm

If you’re worried about run time you could try an alternate clustering approach like UPARSE.

I do wonder about this a bit though. I know people on these forums (including Pat, I think) have expressed concern that because the ITS regions are fundamentally different to the 16S gene (in terms of evolutionary pressure and functional importance) it’s not appropriate to apply established 16S pipelines to them. With that in mind, you could cut the analysis time down significantly by just classifying your amplicons and building your OTU table from phylotypes instead of sequence identity (classify.seqs, phylotype, make.shared).

Topic		Replies	Views
Analysing fungal ITS with the pre.cluster function Commands in mothur	10	5740	July 19, 2016
Query regarding 16S rRNA ITS sequencing	6	356	November 29, 2022
aligning fungal ITS sequences for pre.cluster? Commands in mothur	1	1642	August 28, 2018
analysing fungal 454 sequences Commands in mothur	8	6254	August 13, 2015
formatting database into mothur format Commands in mothur	7	3813	September 14, 2016

Processing amplicon datasets without aligning

Related topics