remove.lineage removes half of my sequences

mrm · October 9, 2014, 6:55pm

Hi,

I canÂ´t figure out what is going on after I run the remove.lineage command to remove Archaea, chloroplasts, and mitochondira sequences. I recover only half of the sequences and it seems very high, In the SOP just 350 sequences are removed.

After removing quimeras I obtained

of unique seqs: 83768

total # of seqs: 976210

Then, I run “mothur > classify.seqs(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Undetermined.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80)”, and then “remove.lineage(fasta=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=Undetermined.trim.contigs.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=Undetermined_S0_L001_R1_001.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)”.

When I run summary.seqs again I obtain

of unique seqs: 34744

total # of seqs: 305921

so the number of both unique and total reads seem to have been reduced by half.

Why can that be?

My samples are a mix of mostly cyanobacteria and other heterotrophic bacteria, I wonder if most of the cyanobacteria were not classified and were removed?

Thanks

mrm · October 9, 2014, 7:34pm

I actually forgot to mention that my research question is to investigate the microbial diversity (heterotrophic community) associated to a cyanobacteria (trichodesmium). realized that when I run the classify command I included “chloroplast” in it. When I look into the trainset9 dataset I see that cyanobacteria are considered as “chloroplasts”, since they are phytoplankton of course. I guess that means that all cyanobacteria are removed and only heterotrophic bacteria remain in my dataset?
Also, I see that “Trichodesmium” is not classified as such in the trainset9 dataset. When I look for Trichodesmium in the silva database online, I find different entries. Some of them I can find them in the trainset9 database but some of them I canÂ´t. For example, I find X70767, which in the silva dataset is classified as Trichodesmium sp. gene for 16S rRNA, but in the trainset9 database is classified as Root;Bacteria;Cyanobacteria/Chloroplast;Cyanobacteria;Family XII;GpXII.

If I wanted to remove only Trichodesmium reads but keep the rest of cyanobacteria, would I have to create my own trainset9 database, or remove only the trichodesmium reads from trainset9?, how could I do it?

Thank you!

pschloss · October 20, 2014, 9:22pm

Correct - you would have to manually alter the tax file and then use remove.lineage.

Pat

Topic		Replies	Views
remove.lineage -not removing lineages Commands in mothur	2	2201	November 20, 2015
remove.lineage removing all seaquences from FASTA file mothur bugs	2	1131	August 5, 2017
Remove.lineages error Commands in mothur	7	1067	November 6, 2018
Remove.lineages; Removing Cyanobacteria->chloroplast sequences Commands in mothur	3	2451	March 22, 2019
remove.lineage Commands in mothur	4	4041	August 1, 2014

remove.lineage removes half of my sequences

of unique seqs: 83768

of unique seqs: 34744

Related topics