Question about sub.sample

Dear Pat,

I know that you do not advise to remove the singletons, but I would like to test the pipeline with and without removing the singletons. Thus, I am not sure about the sub.sample script. I am following more or less the MiSeqs SOP. Thus, the scripts that I am doing are: … pre.cluster -> chimera.uchime -> remove.seqs -> summary.seqs -> pcr.seqs ->
classify.seqs -> remove.lineage -> count.groups -> sub.sample -> rename.file -> cluster.split -> cluster.split -> make.shared -> filter.shared -> sub.sample -> summary.single…
I am not sure whether or not to maintain the first “sub.sample” after the “count.groups” in the pipeline?
Thanks in advance!

Hi Cris,

sub.sample doesn’t remove singletons - it subsamples all of the samples to have the same number of reads. You likely want remove.rare. If you want to use it the way other programs do, you would insert it after the pre.cluster step and before chimera checking.


Hi Pat,

Very true!
Whether I got it right from your reply, I should keep the first sub.sample as it is and then after per.cluster and before chimera.uchime I should use remove.rare(list=current, nseqs=2, label=0.10), should I leave label=0.10 or change to 0.03 as I am using 0.03 for OTU?

i think the last step (i.e. after OTUs) would be to rarefy/subsample your data