Hello,
I found one other post on this subject which believed this error was due to a formatting issue but the post ended there. So I am getting this same issue and I can’t figure out how to resolve it.
The data is 16S amplicon sequences from two combined MiSeq runs (starting with ~40 mill reads). I ended up with 1.4 million OTUs which is mostly the tail and so I wanted to filter most of this out so I can get to a manageable amount of sequences.
Here is my command:
mothur > filter.shared(shared=April_allfiles_deindex_cat.good.unique.good.filter.unique.precluster.pick.pick.an.unique_list.shared, minnumsamples=10, label=0.01)
0.01
Segmentation fault (core dumped)
I have tried remaking the shared file and several other parameters on the filter.shared command but same the error is produced. Could it have to do with the size of the file?
Here is a dropbox link to the shared file.
Thanks!!
I was able to run the command successfully and will email you the filtered shared file. The segfault was likely caused because you ran out of memory. The command used about 18G of RAM and removed 1427678 OTUs. I think the larger issue the number of OTUs you have after clustering.
Hi Sarah
Thank you very much. I don’t know if running out of RAM is the reason because the server I use has 512 GB of RAM so there should be plenty. You were obviously able to get it to run on your machine though.
I agree that the number of OTUs is high. I thought having 1.6 million after preclustering was also high. Clustering that many reads took 10 days with cluster.split at taxlevel=5 and processors=60. This 10 days was mostly due to just a couple processors working on some huge bins. I basically followed the MiSeq SOP with some slight variations so I was surprise to see this high number of OTUs. Any thoughts on why I am ending up with so many OTUs? My guess was maybe the quality of my sequencing run was not too great, as usual the ends of my reads were pretty low quality. I am only doing partial overlap, not complete.
My amplicon is about 370 bases so I use diff=3 for precluster, this seems to be on the border of what Pat recommends so I went with 3 but based on the results I am getting perhaps 4 would be more appropriate.
Thanks!