mothur

Remove.lineages error


#1

Upon reaching the stage in the mothur pipeline where I need to classify my sequences and remove undesirable lineages, mothur seems to remove all my sequences! I am not entirely sure what I am doing wrong, but I suspect it has something to do with my reference and taxonomy files.

Can anybody help me here?

This is my workflow

classify.seqs(fasta=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.bacteria.fasta, taxonomy=silva.bacteria.silva.tax, cutoff=80)

It took 72 secs to classify 25927 sequences.
It took 3 secs to create the summary file for 25927 sequences.

Output File Names:
pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.taxonomy
pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.silva.wang.tax.summary

remove.lineage(fasta=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, taxonomy=pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota)

[NOTE]: The count file should contain only unique names, so mothur assumes your
fasta, list and taxonomy files also contain only uniques.

Your taxonomy file contains only sequences from Chloroplast-Mitochondria-unknown
-Archaea-Eukaryota.
Your fasta file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.

Removing group: 5_PAW2 because all sequences have been removed.
etc. etc. for each group

Your group file contains only sequences from Chloroplast-Mitochondria-unknown-Ar
chaea-Eukaryota.


#2

In classify.seqs it looks like you are using the SILVA reference but in remove.lineage it looks like you’re using my RDP reference. Can you try running classify.seqs with the trainset files rather than the SILVA files and see what happens? Also, just to be clear, you did sequence 16S rRNA genes, right?

Pat


#3

Hi Pat,

Thank you for your response!

I’ve just re-run classify.seqs and remove.lineage as above, but with the trainset9_032012.pds.fasta and .tax files. The same error has occurred sadly.

Yes it is for 16S rRNA genes :slight_smile:

Luke


#4

What are you sequencing? Also, can you merge your two threads so we don’t have to ping pong back and forth? The issues are likely the same.


#5

I’m sequencing eDNA samples of the cultured fish microbiota run through the MiSeq platform. :slight_smile: . I should clarify, I have two datasets here, one of the ponds the fish live in, and another of the water samples taken from fish held in a bucket (before adding the fish and then afterwards). The sequences on the “remove.lineages error” thread are the latter fish dataset. The “warning notice from classify.otu” thread is referring to pond water samples.

With the pond water samples, excluding the aforementioned warning notices, I was able to get through the entire pipeline with virtually no issues. You can imagine that I am a little perplexed why the same procedure is not working with the fish samples.

Thank you for your help here, I am fairly new to using mothur, and with my project supervisor currently out of action, any help is appreciated.

Luke


#6

Can you post the first few lines of pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy?


#7

|M01625_131_000000000-C4WTY_1_1101_27864_15559|unknown(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);unknown_unclassified(100);|

At this step I had 25,927 sequences, all of them have been classified as unknown.

The input fasta file for the classify.seqs step (pvttest.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta) had sequences in the following format:

M01625_131_000000000-C4WTY_1_1111_18207_5125
TAC–GG-AG-GGT—GCG-A-G-C-G-T–T--AT-C-CGG-AA—TC-A-C-T–GG-GT–TT–A--AA-GG-GT-AC-G-TA-G-G-C-G–G--T-TA-A-T-T-----AA etc. etc.

Could the reason why mothur is mis-classifying sequences be that the format the sequences are presented is different between the input fasta and reference file?


#8

I’m not sure what you mean by different formats. The gap characters are fine - they’re automatically removed from the input sequences before doing anything. When I run the sequence you posted with silva.bacteria.fasta, I get (I named the sequence “test”)…

test Bacteria(100);Bacteroidetes(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);Bacteroidetes_unclassified(86);

Can you email your input fasta file and the mothur logfile to mothur.bugs@gmail.com so we can take a look? Please include a link to this thread so we can report back.