Well pschloss suggested removing negative control sequences after preclustering so here goes. I’m doing this:
pre.cluster(fasta=blah.fasta, count=blah.count_table, diffs=2)
summary.seqs(fasta=blah.precluster.fasta, count=blah.precluster.count_table) #look at what you got
count.groups(count=blah.precluster.count_table) #look at what you got in each group
get.groups(fasta=blah.precluster.fasta, count=blah.precluster.count_table, groups=BN1-BN2) #single out the negative control groups here; my "BN"s
summary.seqs(fasta=blah.precluster.pick.fasta, count=blah.precluster.pick.count_table) #look at whats in the negative control just in case
system(rename blah.precluster.pick.fasta neg_control.fasta) #rename neg control fasta file something nicer
system(rename blah.precluster.pick.count_table neg_control.count_table) #rename neg control count file something nicer
list.seqs(count=neg_control.count_table) #generate accnos file for neg control
remove.seqs(accnos=neg_control.accnos, fasta=blah.precluster.fasta, count=blah.precluster.count_table) #remove sequences (just like for chimeras)
summary.seqs(fasta=blah.precluster.pick.fasta, count=blah.precluster.pick.count_table) #make sure fewer sequences
count.groups(count=blah.precluster.pick.count_table) #make sure negative control groups disappear
This appears to work for me. Am I horribly wrong? I would like for another person to try this and give feedback please.
I’m thinking analyses should be performed with and without negative control removal, to see the potential affect it has, discuss the taxa that are “removed” etc. Any thoughts?
What would be the advantage of removing negative control sequences before doing chimera searching and removal? Would it be better to remove chimeras first, and then remove negative control sequences?