Remove.lineages error

luker · October 25, 2018, 2:31pm

Upon reaching the stage in the mothur pipeline where I need to classify my sequences and remove undesirable lineages, mothur seems to remove all my sequences! I am not entirely sure what I am doing wrong, but I suspect it has something to do with my reference and taxonomy files.

Can anybody help me here?

This is my workflow

classify.seqs(fasta=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.bacteria.fasta, taxonomy=silva.bacteria.silva.tax, cutoff=80)

It took 72 secs to classify 25927 sequences.
It took 3 secs to create the summary file for 25927 sequences.

Output File Names:
pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.taxonomy
pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.tax.summary

remove.lineage(fasta=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

[NOTE]: The count file should contain only unique names, so mothur assumes your
fasta, list and taxonomy files also contain only uniques.

Your taxonomy file contains only sequences from Chloroplast-Mitochondria-unknown
-Archaea-Eukaryota.
Your fasta file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.

Removing group: 5_PAW2 because all sequences have been removed.
etc. etc. for each group

Your group file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.

pschloss · October 25, 2018, 5:39pm

In classify.seqs it looks like you are using the SILVA reference but in remove.lineage it looks like you’re using my RDP reference. Can you try running classify.seqs with the trainset files rather than the SILVA files and see what happens? Also, just to be clear, you did sequence 16S rRNA genes, right?

Pat

luker · October 26, 2018, 12:35pm

Hi Pat,

Thank you for your response!

I’ve just re-run classify.seqs and remove.lineage as above, but with the trainset9_032012.pds.fasta and .tax files. The same error has occurred sadly.

Yes it is for 16S rRNA genes

Luke

pschloss · October 26, 2018, 7:09pm

What are you sequencing? Also, can you merge your two threads so we don’t have to ping pong back and forth? The issues are likely the same.

luker · October 28, 2018, 9:59pm

I’m sequencing eDNA samples of the cultured fish microbiota run through the MiSeq platform. . I should clarify, I have two datasets here, one of the ponds the fish live in, and another of the water samples taken from fish held in a bucket (before adding the fish and then afterwards). The sequences on the “remove.lineages error” thread are the latter fish dataset. The “warning notice from classify.otu” thread is referring to pond water samples.

With the pond water samples, excluding the aforementioned warning notices, I was able to get through the entire pipeline with virtually no issues. You can imagine that I am a little perplexed why the same procedure is not working with the fish samples.

Thank you for your help here, I am fairly new to using mothur, and with my project supervisor currently out of action, any help is appreciated.

Luke

pschloss · October 29, 2018, 5:50pm

Can you post the first few lines of pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy?

luker · October 30, 2018, 3:42pm

|M01625_131_000000000-C4WTY_1_1101_27864_15559|unknown(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);|

At this step I had 25,927 sequences, all of them have been classified as unknown.

The input fasta file for the classify.seqs step (pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta) had sequences in the following format:

M01625_131_000000000-C4WTY_1_1111_18207_5125
TAC–GG-AG-GGT—GCG-A-G-C-G-T–T–AT-C-CGG-AA—TC-A-C-T–GG-GT–TT–A–AA-GG-GT-AC-G-TA-G-G-C-G–G–T-TA-A-T-T-----AA etc. etc.

Could the reason why mothur is mis-classifying sequences be that the format the sequences are presented is different between the input fasta and reference file?

pschloss · November 6, 2018, 5:41pm

I’m not sure what you mean by different formats. The gap characters are fine - they’re automatically removed from the input sequences before doing anything. When I run the sequence you posted with silva.bacteria.fasta, I get (I named the sequence “test”)…

test Bacteria(100);Bacteroidetes(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);

Can you email your input fasta file and the mothur logfile to mothur.bugs@gmail.com so we can take a look? Please include a link to this thread so we can report back.

Topic		Replies	Views
remove.lineage -not removing lineages Commands in mothur	2	2186	November 20, 2015
Remove.lineage files not in synch (tax,group) mothur bugs	10	16702	January 23, 2012
remove.lineage removing all seaquences from FASTA file mothur bugs	2	1116	August 5, 2017
PROBLEM WITH CLASSIFY.SEQ AND REMOVE.LINEAGE Commands in mothur	5	2991	March 5, 2015
remove.lineage removes half of my sequences Commands in mothur	2	3171	October 20, 2014

Remove.lineages error

Related topics