Splitting rare and abundant OTUs by percentages


I’m pretty sure there would be a way to do this but I can’t seem to find how. Ultimately would look at both rare and abundant datasets separately.

I don’t think there is an option for percentages with split.abund (unless I missed it).

Alternatively to use split.abund, I think that filter.shared might do a better/faster job (as can directly use a shared file) using makerare=F but I’m not sure if there is a way to output the rare/removed OTUs (e.g. as an accnos-type file or “rare.shared” file). Same problem with remove.rare, not sure if there is a way to access the removed OTUs.

Any suggestions, work around?

Worst case was thinking of subsampling a list and fasta file in order to have the same number of sequences in all groups and then using split.abund with a cut-off that would match my desired percentage cut-off but would not be ideal.

Thanks a lot (you guys are great!)

I’m not a big fan of this type of analysis. Regardless, you could script this in R. We’ll add it as a possible future feature


Thanks Pat for the quick reply;

I’m aware that you’re highly against getting rid of rare sequences (thanks by the way for the different posts on why that is) and this was not my goal here. I was just wondering whether there was a quick way of using Mothur to zoom in on the rare sequences to possibly look at more subtle variations between highly similar groups (and because I have a fair amount of variation in the number of sequences between groups, thought that subsampling using a percentage/relative abundance cutoff might be a bit more appropriate than simply nseqs).

I’ll keep an eye on the new releases to see whether you guys add such options or not – will use workaround in the meantime.
Thanks again, Guillaume