Chimera.uchime settings for different sample types

I have 16S data from 4 species of hosts that I am planning to compare, so I want to process them simultaneously through mothur. Unfortunately, I find that chimera.uchime is removing a large real OTU from all of my samples because it gets flagged in one of the four species where it shows up rarely. It may be a real chimera in that species or not. The default setting to remove chimeras flagged in one sample from all samples makes sense in most circumstances and I hate to turn that off, but I don’t see another obvious answer. It would be great if chimera.uchime would accept a design file to bin groups for chimera checking for just this circumstance, but I don’t have time to wait for a new feature.

Aside from splitting my files by species, running uchime on each species group, then merging everything back together, do I have any other options currently?

Thanks for the help!

The dereplicate option in chimera.chime is probably what you’re looking for. You have to run the remove.seqs() command slightly different after using this parameter but it’s explained in the wiki with examples of how the uchime/dereplicate and remove.seqs/dups parameters interact.

As you say, it’s not clear whether or not this is a good idea, but it’s possible.

I considered the dereplicate option, but I was afraid it would introduce more problems than it solved.

I split the files by species before uchime and ran it as normal on each set of files. I still had a problem with uchime going wild and removing several legitimate OTUs from this one species. The other three species all had the typical (for me) few hundred chimeras. The fourth had thousands. There are no indications that the fourth has any innate quality issues. I’m not sure what the problem is, unless there is just a particular mix of closely related OTUs that look like chimeras to uchime.

Oddly enough, I realized that this was a problem because uchime didn’t exhibit this behavior the first time I ran the pipeline with all of the barcodes separate all of the way through. Once I determined which samples were too low coverage and which samples were sequenced twice and needed to be combined, I went back and reran the pipeline with the low samples removed and the ones that needed it combined after sff.multiples. That’s when uchime went wonky. This suggests that it’s one of the combined samples that causing the problem, but I’m not sure what to do with that information (obviously I didn’t want to throw those out).

I switched over to using the silva reference for uchime, which returned the expected few hundred chimeras. I guess I will have to change over.

I would second the dereplicate approach. We have the same problem with antibiotic treated mice where abundant bugs become rare and rare abundant and frequently get good sequences trashed if we don’t use this approach.


So in your experience dereplication provides better results than just using the silva reference? Thanks!

I wouldn’t suggest using silva.bacteria.fasta as there might actually be chimeras in there that we don’t know about. I’d use the denovo approach where you provide a count_table or names file. Alternatively use the silva-aligned gold file.