A concern for a run I’m doing now: I have a set of samples (sputum from people with cystic fibrosis, and from normal people, and reagent controls): 23 samples in all. I have all the fastq R1 and R2 files from a MiSeq run, etc. I’m running Mothur 1.34.3 but have the same issue in 1.36. I’m using the MiSeq SOP and have a batch file that I know works from that. I’m using Silva v119.
The issue: with these 23 samples, I have a 1.03 TB dist file (stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.dist) that took 275,445 seconds, or just over 68 hours, to generate. I’ve simply not encountered this in previous Mothur runs even when I had considerably more samples. For example, for a paper I have in review right now that had 58 samples, using Mother 1.34.1 earlier this year, I had a 27.7 GB dist file.
This seems to be a consistent issue, as I’ve re-run my code with the 23 samples and have the same file size.
I tried to do cluster.split instead and had an error: “‘HWI-M20149_246_000000000-AHBRF_1_1101_15057_5312’ is not in your name or count file, please correct.” If I go into the count.table and put that in (copying the code for another line, substituting the above name and giving it a total count of 1), I get an error for another, different OTU.
I’m now using the dist file generated above to do a cluster (next step in the SOP). As one can imagine, it’s taking a while.
So I think there’s something wrong but don’t know what it is. Any thoughts?
Thanks
Steve White