Combining HiSeq 2000 and 2500

pad · February 12, 2016, 4:27pm

Hi Mothur team,

I have amplicon data generated by the 2000 and 2500 (same V region in both cases) that I’d like to analyze together. Given the different read lengths they produce, should sequences be trimmed after alignment or after make.contigs?

Many thanks.

pschloss · February 16, 2016, 11:40am

Hi and thanks for your question. I’m not a big fan of using the HiSeq to process 16S rRNA gene sequences since it is not currently possible to generate fully overlapping reads for our popular regions (e.g. V4). I think you’d have to process the first and second reads separately, but regardless, if you have data for two different lengths and the same region, you need to trim them to be the same length. You’d also want to make sure (using a mock community) that the error rates are the same. Regardless, you’re not going to be able to make OTUs and will need to use the phylotype approach described in the SOP. Here’s some information on the effects of high error rates:

http://blog.mothur.org/2014/09/11/Why-such-a-large-distance-matrix%3F/

Pat

pad · February 17, 2016, 4:14pm

Thank you, Pat; that’s very helpful. (FYI, we are sequencing the V3 region , so the HiSeq may work…)

pschloss · February 18, 2016, 11:26am

It would be great to see some mock community data from a HiSeq to see whether its error profile is similar to that of the MiSeq

Pat

Shaunson26 · May 16, 2016, 12:26am

I only have some prelim data here, but we had some data using HiSeq 100bp (EMP), HiSeq 150bp (EMP) and in-house MiSeq (2x250). Unfortunately, the data was from a time series, so experiment confounds the ability to distinguish sequencing effects. However, there were 5 time points (1 for 100bp, 1 for 150 bp and 3 for the 250 bp), and we included 2 samples from time 1 with the 250 bp run.

I simply used the forward reads in all, and only 100 bp.

A quick view of the data showed that all the 3 sequencing runs clustered separately from each other. So while the MiSeq ran 3 of the time points, they all clustered together, and the 2 samples from time 1 included in this sequencing run also clustered in this region.

I lost all hope in analysing the data together.

Could be different in your case. Maybe if all the sequencing was prepared by the same center?

Good luck!

pschloss · May 16, 2016, 9:50am

Wow, that… really sucks.

Shaunson26 · May 17, 2016, 2:27am

I just had a look at this data again. I had actually 3 other samples in the MiSeq run that could be compared with the HiSeq 2000 (1 x 150 bp) and they clustered together. So I guess the initial HiSeq 2000 (1 x 100 bp) seemed most dodgey.

Looking at my pipeline, I simply;

Trim sequences

trim.seqs(fasta=ts.fasta, qfile=ts.qual, qwindowsize=5, qwindowaverage=25, minlength=75, keepfirst=105, maxambig=0, maxhomop=8, processors=8)

Extract trimmed sequences from the groups file (no oligos were used to generate a groups file, so the long way around)

list.seqs(fasta=ts.trim.fasta)
get.seqs(accnos=ts.trim.accnos, group=ts.groups)

Have a go and good luck!

Topic		Replies	Views
Sequence length Theory behind mothur	26	9060	May 2, 2017
Miseq, long reads vs short reads Theory behind mothur	2	4903	August 13, 2014
16S and 18S Sequence Mix for analysis!? Commands in mothur	3	3024	June 1, 2015
HiSeq V4 region with V2 chemistry Theory behind mothur	2	1885	April 7, 2016
pcr.seqs start/stop for V3_V4 on MiSeq Theory behind mothur	7	3003	August 25, 2017

Combining HiSeq 2000 and 2500

Trim sequences

Extract trimmed sequences from the groups file (no oligos were used to generate a groups file, so the long way around)

Related topics