Keeping Illumina headers unchanged for non-Mothur use

dylansmith17 · May 15, 2013, 7:38pm

After commands like trim.seqs, screen.seqs, make.contigs, etc., Mothur always tells you “I changed your : to - for ease in downstream analyses”. The only problem is that other programs, e.g. Pandaseq (for pairing R1 and R2 Illumina sequences) won’t read the sequences in the format Mothur changes to. Is there a command to pass to bypass the Illumina sequence header change that Mothur automatically likes to do? Thanks.

pschloss · May 16, 2013, 12:12pm

We could add that as a feature to make.contigs, but… Pandaseq is a program to make contigs, which is what make.contigs does. So I’m not sure why you would need to do this… FWIW, we change the : to _ for people that like to make trees.

dylansmith17 · May 17, 2013, 12:19am

I think it would be a worthwhile option/command to have in Mothur, as there are so many programs and scripts that require specific headers to read. I wanted to use Pandaseq after some sequence truncating/editing in Mothur to compare Mothur’s read-pairing algorithm to Pandaseq’s, as I know they are built slightly differently.

In case you’re interested why, I’ve been pairing reads from fungal ITS amplicons in Mothur, and trying different levels of quality filtering/truncating of individual reads prior to pairing. I’m getting worried because when I truncate reads more and more (e.g. increasing qwindowaverage, playing with qwindowsize), Mothur is still pairing all my reads but just making them shorter on average. Shouldn’t I get to a point where Mothur can’t pair the reads any more because the region of overlap has been cut off?

Thanks for your help.

pschloss · May 17, 2013, 11:49am

Do the number of reads with N’s go up as the contigs get shorter?

westcott · May 17, 2013, 12:53pm

Thanks for the suggestion. I added a modifynames parameter to the set.dir command. In the next release you will be able to run the following:

set.dir(modifynames=F)

to override the changes to sequence names in mothur’s commands.

dylansmith17 · May 17, 2013, 5:10pm

Interestingly, the number of Ns seem to go down as the reads get shorter. Looking only at the average paired-read length, the average decreases as quality trimming becomes more rigorous, but only slightly (e.g. 214bp, 212bp, 211.9bp, etc.). However, the shortest read in the dataset becomes smaller and smaller, e.g. 172bp, 75bp, 71bp, 31bp, and this sequence never contains any Ns. As a result of the lower proportion of N-containing sequences with each round of filtering, the total amount of retained sequences after screen.seqs( maxambig=0) is larger each time.

For what it’s worth, I doubt that this will seriously affect the ecological conclusions of my study down the road, as I also filter my OTU table fairly stringently, subsample/rarefy, etc., but it is somewhat puzzling at this stage…

Thanks for your help.

Topic		Replies	Views
make.contigs vs trim.seqs using illumina Commands in mothur	8	7447	February 12, 2014
Sequence name modification Feature requests granted	0	5113	May 21, 2013
Analysis of Illumina data - problem with make.contigs Commands in mothur	10	9976	September 10, 2014
Illumina single-read with index in 2nd sequencing run Commands in mothur	6	9993	May 8, 2014
Trimming ambiguous ends16S Commands in mothur	3	94	December 1, 2025

Keeping Illumina headers unchanged for non-Mothur use

Related topics