I hope you’re doing well. I’m reaching out for assistance regarding persistent errors I’ve encountered when using Mothur for a metabarcoding analysis. I’ve been working through the pipeline in Aylagas et al. (2016) and encountered some problems, particularly when trying to generate a count table using the count.seqs command with my group file.
Context:
I’m currently analyzing 8 CO1 zooplankton samples from southwestern Puerto Rico using the NOAA Atlas database (North Atlantic CO1 reference). Everything works smoothly up to the point where I try to run count.seqs with my .groups file, which was generated using the make.groups command (following Aylagas 2016, section 3.4, step 1). However, I keep getting warnings and errors about non-unique sequence names or illegal characters, despite verifying that the format looks correct.
Error Messages:
When I run count.seqs, I get repeated warnings such as:
[WARNING]: group M03719_761_000000000-J9GYJ_1_1117_19184_5626 contains illegal characters in the name. Group names should not include :, -, or / characters.
[NOTE]: Updating M03719_761_000000000-J9GYJ_1_1117_19184_5626 to M03719_761_000000000_J9GYJ_1_1117_19184_5626 to avoid downstream issues.
[ERROR]: M03719_761_000000000-J9GYJ_1_1117_19184_5626 is not in your groupfile, please correct.
The key issue seems to revolve around Mothur not recognizing the sequence names, despite them being present in both the names and group files. I’ve checked manually by searching for the names, and they do match in both files. Still, Mothur sees them as different.
Additionally, I frequently get warnings like this during screen.seqs:
[WARNING]: Your groupfile contains more than 1 sequence named 4,1. Sequence names must be unique. Please correct.
I’m not sure why these errors persist, as I’ve generated the group file using make.groups, and it should contain unique entries.
What I Have Tried:
- Manually Reviewing Files: I have manually reviewed the group and name files to ensure that there are no visible formatting issues, such as unwanted characters or duplicated names.
- Cleaning Group Names: We attempted to replace characters that Mothur finds problematic (such as “-”, “:”, or “/”) using both Mothur’s own handling and custom Python scripts. However, the issue still persists.
- Using get.seqs: I tried using get.seqs to match names and groups to ensure consistency, but the problem continued.
- Count Table: I’ve attempted to generate a count table manually but ran into the same sequence recognition issues. Mothur continues to show sequence names as non-unique or not present, even though they are clearly in the files.
Could you provide any guidance on what might be causing this problem and how to resolve it? Specifically:
- Are there any hidden formatting issues that could be triggering these errors?
- Is there an alternative way to generate the count table that might avoid these issues?
- Could there be invisible differences between the sequence names that Mothur is detecting?
Thank you so much for your time and assistance. If you need me to add anything please let me know. I appreciate any help you can provide!