I have been having issues with the namefile and groupfile not matching after a set of merge, screen, filter, and unique commands using mothur-1.23 - I’ve tried a couple different tests to figure out where the missing sequences went, but I’m at a loss right now… Here are the commands that I’ve used and the error that comes up:
[ERROR]: Your name file contains 763839 valid sequences, and your groupfile contains 1162922, please correct.
Somewhere around the filter step, a bunch of sequences go missing. This is a pretty large dataset combining multiple plates. Please let me know if there is additional information I can send you guys. Thanks very much for your help!
Does mothur output any missing names? Could any of your sequences have the same name? If you send your files to mothur.bugs@gmail.com, I can try and track down the problem for you.
It’s only after the make.shared command that it outputs a list of the missing sequences (it would be a nice function to have the same output of missing sequences in the pre.cluster command as the make.shared command…).
I don’t think that there are duplicate sequence names… at least not that I’ve been able to track down…
The files are really big - is there another way for me to send you the files? And which files specifically do you need - thanks so much for your help. I really really appreciate it!
I suspect somewhere in the analysis before these files were created you forgot to include the names and groups files on a command that removed sequences. If you post the commands you used to get to this point I may be able to spot the mistake.