Dist.seq output too big

phu5ioin · January 15, 2014, 2:19am

Hello!

I am a newbie to Mothur and am currently following the Miseq SOP. Right now I am at dist.seq of OTU analysis step and am getting an output file in the region of 200GB and still growing! I have 12 samples, 1.6million total reads and 354k unique reads. May I know whether it is advisable to continue? (I estimate the final .dist output will be around 500-600GB?!?), because I am worried that it will affect the clustering step later? Thank you!!

DL

pschloss · January 15, 2014, 8:50pm

yeah you may as well quit it now. what are you sequencing? if your reads do not fully overlap you’re likely forced to just use the phylotype based approach since your error rate is so high. the result of a high error rate is an inflated number of unique sequences.

phu5ioin · January 17, 2014, 1:24am

Thank you Pat, I didn’t realise I have a problem with high error rates until you mentioned!

I’m sequencing a 16s rRNA fragment spanning the V5 to V8 region and is around 500-600bp long. Yes the reads do not fully overlap. I am trying out the phylotype-based approach now. Would you also advise whether it is feasible for me to use the split.abund command to divide my sequences into abundant and rare groups, and then only using the abundant sequences to continue with OTU-based approach? Rationale being most of the unique sequences with few representations are likely to be due to sequencing error?

Thank you so much!

DL

pschloss · January 20, 2014, 2:10pm

Sorry, but I don’t think the OTU-based approaches will work for you.

Topic		Replies	Views
Using cluster.split with large data Commands in mothur	2	2699	March 31, 2014
Cluster command is giving me a large number OTUs, more than it should be Commands in mothur	4	430	March 14, 2021
Problems with dist.seqs and illumina reads mothur bugs	1	2521	January 6, 2014
HUGE dist file when running Eukarya analysis Commands in mothur	3	613	August 10, 2019
make.shared only dishes out unique otus Commands in mothur	3	2595	September 25, 2014

Dist.seq output too big

Related topics