Remove.lineage: accnos file missing

Hi
I ran classify.seqs with silva seed database and then remove.lineage.
I have two questions:

  1. did I choose the correct name for the groups to be removed? the SOP mention the names may be different from the used with RDP database.
  2. the output files include the accnos file, and mothur already mentioned that Removed 113 sequences from your fasta file. Removed 180 sequences from your count file., but I cannot find it among the files already saved. Any idea why?
    Below the logfile. Thanks!!
    Susi
mothur > 
classify.seqs(fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.seed_v132.pcr.align, taxonomy=silva.seed_v132.tax, cutoff=80, probs=F)

Using 4 processors.
Generating search database...    DONE.
It took 10 seconds generate search database.

Reading in the silva.seed_v132.tax taxonomy...	DONE.
Calculating template taxonomy tree...     DONE.
Calculating template probabilities...     DONE.
It took 26 seconds get probabilities.
Classifying sequences from site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta ...
[WARNING]: M01426_142_000000000-ADATB_1_1118_18228_21944 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1109_2115_18523 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1104_10876_18856 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1102_19321_15093 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1101_13427_10763 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_2110_25447_19404 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1112_8463_23365 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1110_20705_10861 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1101_22053_14186 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1106_23790_11496 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
It took 5134 secs to classify 251806 sequences.

It took 5134 secs to classify 251806 sequences.

It took 12 secs to create the summary file for 251806 sequences.

Output File Names: 
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.taxonomy
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.tax.summary

mothur > remove.lineage(fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.taxonomy, taxon=Chloroplast-Mitochondria-Unclassified-Archaea-Eukaryota)

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

/******************************************/
Running command: remove.seqs(accnos=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.accnos, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta)

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

Removed 113 sequences from your fasta file.
Removed 180 sequences from your count file.

Output File Names: 
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table

/******************************************/

Output File Names:
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.pick.taxonomy
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.accnos
site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta

1. did I choose the correct name for the groups to be removed? the SOP mention the names may be different from the used with RDP database.

Mothur intentionally does not flag specific taxonomic groups for removal, but allows the user to select the contaminants you want to remove based on the research questions you are asking. In general we remove Chloroplast-Mitochondria-unknown-Archaea-Eukaryota, but it’s important to note that these names may vary slightly based on the reference you use to classify. For example greengenes formats their taxonomic assignments with a leading character and "_’ to indicate the level.

 129142	k__Archaea;p__Euryarchaeota;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__;

k - kingdom, p - phylum, c - class, o - order, f - family, g - genus, s - species

To remove the Archaea from a dataset classified using GreenGenes, you would set taxon=k_Archaea. Since you used silva to classify your sequences, the contaminates list looks fine.

2. the output files include the accnos file, and mothur already mentioned that Removed 113 sequences from your fasta file. Removed 180 sequences from your count file., but I cannot find it among the files already saved. Any idea why?

Hmm… So you can find the *.taxonomy, *.fasta and *count file but no accnos file? Could you post the contents on the folder containing the site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.pick.taxonomy file?

Hi Sarah
Thank you for you reply. Here a print screen of the folder. Below, the full content of the logfile from the classify.seqs step. I’d like to have the accnos file because as I am working with soil samples, I’d like to sepparately classify those sequences to have a rough idea of what else was detected with the primers used, as sometimes chloroplast or mitochondria are more or less specific to certain high taxonomic level groups.

Logfile:
Windows version

mothur v.1.42.3
Last updated: 6/24/19
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type ‘help()’ for information on the commands that are available

For questions and analysis support, please visit our forum at https://forum.mothur.org

Type ‘quit()’ to exit program

[NOTE]: Setting random seed to 19760620.

Interactive Mode

mothur >
classify.seqs(fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.seed_v132.pcr.align, taxonomy=silva.seed_v132.tax, cutoff=80, probs=F)

Using 4 processors.
Generating search database… DONE.
It took 10 seconds generate search database.

Reading in the silva.seed_v132.tax taxonomy… DONE.
Calculating template taxonomy tree… DONE.
Calculating template probabilities… DONE.
It took 26 seconds get probabilities.
Classifying sequences from site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta …
[WARNING]: M01426_142_000000000-ADATB_1_1118_18228_21944 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1109_2115_18523 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1104_10876_18856 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1102_19321_15093 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1101_13427_10763 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_2110_25447_19404 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1112_8463_23365 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M01426_142_000000000-ADATB_1_1110_20705_10861 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1101_22053_14186 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
[WARNING]: M02133_32_000000000-AL5EB_1_1106_23790_11496 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.
It took 5134 secs to classify 251806 sequences.

It took 5134 secs to classify 251806 sequences.

It took 12 secs to create the summary file for 251806 sequences.

Output File Names:
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.taxonomy
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.tax.summary

mothur >
remove.lineage(fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.taxonomy, taxon=Chloroplast-Mitochondria-Unclassified-Archaea-Eukaryota)

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

/******************************************/
Running command: remove.seqs(accnos=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.accnos, count=site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta)

[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.

Removed 113 sequences from your fasta file.
Removed 180 sequences from your count file.

Output File Names:
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta
site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table

/******************************************/

Output File Names:
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.pick.taxonomy
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.seed_v132.wang.accnos
site7.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.pick.count_table
site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta

mothur >
quit

Thank you for reporting this issue. I found the source of the error and have fixed it. The change will be part of our official 1.43.0 release coming next week. In the meantime you can create the accnos file with this workaround:

mothur > list.seqs(fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.pick.fasta) - list names of “good” sequences

mothur > remove.seqs(accnos=current, fasta=site7.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta) - remove “good” sequences from fasta file before remove.lineage was run. This will leave you with only the “bad” reads.

mothur > list.seqs(fasta=current) - create accnos file containing names of “bad” sequences

Thanks again for helping us find this bug,
Sarah

Hi Sarah
Thanks to you for your help!! and happy to help improve Mothur!
I will try following your instructions. But I feel a bit confused…
After running remove.seqs as you indicated, I will end with a fasta file with only the bad sequences, used with list.seqs to create the accnos file containing thenames of those bad sequences.
Then, which fasta file should I use for remove. lineage? there will be also an output fasta file with only the good sequences to go forward with remove.lineage?
Susana

Sorry for the confusion. The remove.lineage command creates the fasta, count and taxonomy files containing the “good” sequences which you can use to continue your analysis. The command was accidentally deleting the accnos file containing the list of sequences that were removed by remove.lineage. The accnos file is not needed, but lists the sequences that were removed for your reference. The commands above allow you to recreate the accnos file using the output files from remove.lineage and classify.seqs.

Dear Sarah,

You said that “Since you used silva to classify your sequences, the contaminates list looks fine.” Does this mean for both SILVA and RDP, we can use taxon=Chloroplast-Mitochondria-Unclassified-Archaea-Eukaryota to remove the undesirables? Are these two reference databases using exactly the same lineages?

Thanks.

You need to double check that Silva uses the same names for these groups as RDP does.

Pat

Thanks, Pat. I have just opened the SILVA reference database and checked it. I found that SILVA uses the same names (i.e., Chloroplast, Mitochondria, Archaea, Eukaryota).

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.