After I run the SOP on a particular batch of samples, the OTU table has 200k+ OTUs (contamination?? causes?)! Currently, I filter them out towards the end:
cluster.split(fasta=current, count=current, taxonomy=current, splitmethod=classify, taxlevel=4, cutoff=0.03)
classify.otu(list=current, count=current, taxonomy=current, label=0.03)
filter.shared(shared=current,minpercentsamples=5,minpercent=0.01)
make.shared(list=current, count=current, label=0.03)
Is there a way I can remove these much earlier in the pipeline? It would save a lot on the computation time!
thanks!