Hi,
I can´t figure out what is going on after I run the remove.lineage command to remove Archaea, chloroplasts, and mitochondira sequences. I recover only half of the sequences and it seems very high, In the SOP just 350 sequences are removed.
After removing quimeras I obtained
of unique seqs: 83768
total # of seqs: 976210
Then, I run “mothur > classify.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Undetermined.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)”, and then “remove.lineage(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Undetermined.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)”.
When I run summary.seqs again I obtain
of unique seqs: 34744
total # of seqs: 305921
so the number of both unique and total reads seem to have been reduced by half.
Why can that be?
My samples are a mix of mostly cyanobacteria and other heterotrophic bacteria, I wonder if most of the cyanobacteria were not classified and were removed?
Thanks