Hi Pat et.al
I’m running mgcluster in v1.8 and have run into a few puzzlers. First, is it still based on bsr? It runs sooooo much faster than mg-dotur (153 sec to cluster ~113,000 ORFs), we were wondering if it’s somehow now using a blast % identity? If it is now using a % identity, I’d appreciate a few comments on how to relate these new numbers to the mg-dotur numbers. The second is that I’m not getting any of the statistics files and I can’t see any option that allows me to request those. I’m only getting the .list, .sabund, and .rabund files where I’d really like the chao and shared chaos.
I’m glad you found mgcluster. For some reason, I feel like this paper has been ignored. Every now and then I get a note like yours and I get a new glimmer of hope about the human race (ok, not quite but…). The new version is exactly the same, except that it was coded to be much much faster. The output of the clustering should be the same as with mg-dotur. However, you can now interface with the rest of mothur with these data much like you would if you ran the normal cluster command. The only part that uses the % identity is when merging OPFs as we described in the BMC Bioinforamtics paper.
Hope this helps and good luck! We definitely need to get people to realize that throwing away the 70% of genes with no known function is not the way to do microbial ecology.
I’ve got mgcluster running. But the list file it spits out just has the library names for each cluster, not the sequence name. My name file is set up just like your amazon.groups-sequence [tab] library name. Is there an option to tell mgcluster to list the sequences not the library name?
The name file is different from the group file. It is a two column, tab separated file, but the first column is a sequence name and the second column is a list of identical sequence names. Here is an example:
Hope this helps