I am following MiSeq SOP with some variations in the pipeline to better suit my needs and I face the same problem time after time: my distance matrix is too big to fit into memory/RAM of my computer and cluster() or even cluster.split() fail.
Following the suggestions exposed here to my initial doubt:
I tried to make an approximation based on taxonomy instead of distance to run the cluster.split with the ‘file=’ option. However, it is also taking so long…
Therefore, I decided to make an approximation to my data without using OTUs, just using taxonomic information coming from ‘classify.seqs’. My only problem would appear if I need to claculate richness index, but I think I could solve that.
And here is my question: If you follow MiSeq SOP you reach the step of ‘pre.cluster’, when you ‘merge’ sequences that are 1 nt each 100bp different. If after doing this, you repeat the command but asuming a difference of 3 nt each 100bp, would it be equivalent to make OTUs at a cutoff=0.03 using a distance matrix?
Would it be correct to treat these output sequences as OTUs for subsequent analyses?
Thanks, I will try that also.
However, the latest analyses I have made are a merge of 2 dataset (one with 26 samples and the other with 9) that worked OK separately. The problem is when you put them together. Both datasets are 16S from bacteria (same primers, same lab) from the same sampling points and from similar environments (soil and rocks). Would it be a problem of the number of samples?
About the ‘pre.cluster’ question, would this step be assimilated as a OTU clustering at different similarity levels as I proposed before?