I am using a MacPro (3.5 GHz 6 core, 64 GB RAM) to analyze MiSeq data (17GB). It took more than 52 hours to get through the chimera.uchime command and has taken 108 hours to run through the cluster.split command and it has still not completed the command. Does this seem to be normal amount of time used, even after using this much of computing power? Is there a way to speed it up as I have four times this data available, to be analyzed?
What type of MiSeq data? V4?
Thank you for the reply. The data is for the V34 region. Also, the command is still running, not completed yet.
You cannot get a decent error rate sequencing the v34 region since the reads will not fully overlap. The upshot is that because your reads don’t fully overlap, you’ll get a high error rate, resulting in a large number of unique reads. You’re likely to be stuck doing the phylotype-based approach or going back and sequencing the V4 region.
Thanks, Pat. If I understand it right you are suggesting that instead of doing the cluster.split command, I do the phylotype command instead and use the shared file generated from here for further analyses? Also, is there a way to speed-up the processing and use all the computing power I have available?
The best way to speed it up would be to sequence the V4 region