I am trying to remove rare sequences from my data but finding that both remove.rare and split.abund don’t quite do what I want them to.
Basically, I’d like to cluster sequences into OTUs, then remove sequences that occur less than 10 times at each level of the hierarchy, by group. Finally, I’d like to make a shared file to use for further analyses.
Note: This is similar to what the following mothur user tried to do, but I am trying to find a solution where the rare sequences are removed on a bygroup basis. split.abund - No data in abund.groups
- Remove.rare: It seems like a great option because I can use the bygroups=T flag. However, that will only change the shared file since bygroups=T can only be used with the shared file. I found that this led to strange abundances in downstream analyses since the original .names, .list and .groups files remained unchanged.
- Split.abund: I tried split.abund but the problem with this command is that there is no bygroups option. I can use the groups flag but then I get a mountain of .list, .groups, .fasta and .accnos files corresponding to each group in the dataset(e.g. For six groups that is 48 files per OTU level). In order to generate .list and .group files to use with the make.shared option I need to concatenate each .list and .group file for each group, which I not even sure is legitimate.
Is this concatenation strategy a good workaround for this problem?