I am following MiSeq SOP with some variations in the pipeline to better suit my needs and I face the same problem time after time: my distance matrix is too big to fit into memory/RAM of my computer and cluster() or even cluster.split() fail.
Following the suggestions exposed here to my initial doubt:
I tried to make an approximation based on taxonomy instead of distance to run the cluster.split with the ‘file=’ option. However, it is also taking so long…
Therefore, I decided to make an approximation to my data without using OTUs, just using taxonomic information coming from ‘classify.seqs’. My only problem would appear if I need to claculate richness index, but I think I could solve that.
And here is my question: If you follow MiSeq SOP you reach the step of ‘pre.cluster’, when you ‘merge’ sequences that are 1 nt each 100bp different. If after doing this, you repeat the command but asuming a difference of 3 nt each 100bp, would it be equivalent to make OTUs at a cutoff=0.03 using a distance matrix?
Would it be correct to treat these output sequences as OTUs for subsequent analyses?