SOP query

oralmolecol · January 8, 2014, 4:45pm

I am analysing a MiSeq dataset following the SOP and have got as far as generating the shared file so have now moved over to the 454 SOP.

I am here:
“First we need to subsample the sequences from each group and then construct a phylip-formatted distance matrix, which we calculate with dist.seqs”. But the command which follows does not seem to involve a subsampling step as mentioned above?

A secondary issue is that if I run dist.seqs on the non-subsampled fasta file, it takes days to run. I have c490000 sequences and they are c500 bp. Is there any way to speed it up?

Many thanks for your help,

William

pschloss · January 8, 2014, 5:24pm

“First we need to subsample the sequences from each group and then construct a phylip-formatted distance matrix, which we calculate with dist.seqs”. But the command which follows does not seem to involve a subsampling step as mentioned above?

Sorry about that, that was an old typo that has been corrected. You will subsample if/when you run the unifracs or phylo.diversity commands

A secondary issue is that if I run dist.seqs on the non-subsampled fasta file, it takes days to run. I have c490000 sequences and they are c500 bp. Is there any way to speed it up?

Yeah, don’t use MiSeq to generate 500 bp contigs. The problem is that your error rate is much much higher than you can get with 454 or in having fully overlapping sequencing reads. If you look at Kozich et al (2013) you’ll see that as you decrease the overlap between the reads your error rate skyrockets. If you are unable to get OTUs to work, I generally suggest using a phylotype-based approach. Sorry!

Pat

oralmolecol · January 9, 2014, 11:27am

Thanks. These were 2 x 300 bp reads and so we were hoping for a sufficiently good overlap, and did quality filter the reads too. However, I guess that there is just still too much error as seen by the relatively high ratio of unique to total sequences.

William

Topic		Replies	Views
subsampling in MiSeq SOP Commands in mothur	3	4173	April 15, 2014
Dist.seq output too big Commands in mothur	3	2943	January 20, 2014
sub.sample before OTU clustering? Commands in mothur	2	2502	October 21, 2014
sub.sample with fasta, name & group or fasta and count file Commands in mothur	4	3833	October 31, 2014
Random iterative subsampling Commands in mothur	6	1225	June 21, 2017

SOP query

Related topics