Unclassified Cyanobacteria

Hi I am Alan and I am new here, so sorry for asking stupid questions.

I have used Mothur to analyse a 454 sequenceing metagenome.

Everything went smoothly until to the final step when I looked at the an.unique_list.0.03.cons file, all the bacteria are classified well, except the problem is all the cyanobacteria are listed as unclassified below phylum level.
I used silva bacteria as alignment reference, and I used RDP as the classifying template.

Below is the pipeline of my analysis if it is required.
mothur > sffinfo(sff=pustular.sff, fasta=T)
mothur > make.group(fasta=pustular.fasta, groups=pustular)
mothur > screen.seqs(fasta=pustular.fasta, group=pustular.groups, maxambig=0, optimize=start-end, criteria=90, minlength=200, processors=4)
mothur > unique.seqs(fasta=pustular.good.fasta)
mothur > count.seqs(name=pustular.good.names, group=pustular.good.groups)
mothur > align.seqs(fasta=pustular.good.unique.fasta, reference=silva.bacteria.fasta)
mothur > screen.seqs(fasta=pustular.good.unique.align, count=pustular.good.count_table, summary=pustular.good.unique.summary, start=1044, end=8362)
mothur > filter.seqs(fasta=pustular.good.unique.good.align, vertical=T, trump=.)
mothur > unique.seqs(fasta=pustular.good.unique.good.filter.fasta, count=pustular.good.good.count_table)
mothur > pre.cluster(fasta=pustular.good.unique.good.filter.unique.fasta, count=pustular.good.unique.good.filter.count_table, diffs=3)
mothur > chimera.uchime(fasta=pustular.good.unique.good.filter.unique.precluster.fasta, count=pustular.good.unique.good.filter.unique.precluster.count_table, dereplicate=t)
mothur > remove.seqs(fasta=pustular.good.unique.good.filter.unique.precluster.fasta, accnos=pustular.good.unique.good.filter.unique.precluster.uchime.accnos)
mothur > classify.seqs(fasta=pustular.good.unique.good.filter.unique.precluster.pick.fasta, count=pustular.good.unique.good.filter.unique.precluster.uchime.pick.count_table, reference=trainset9_032012.rdp.fasta, taxonomy=trainset9_032012.rdp.tax)
mothur > dist.seqs(fasta=pustular.good.unique.good.filter.unique.precluster.pick.fasta, cutoff=0.30)
mothur > cluster(column=pustular.good.unique.good.filter.unique.precluster.pick.dist, count=pustular.good.unique.good.filter.unique.precluster.uchime.pick.count_table, cutoff=0.07)
mothur > make.shared(list=pustular.good.unique.good.filter.unique.precluster.pick.an.unique_list.list, count=pustular.good.unique.good.filter.unique.precluster.uchime.pick.count_table, label=0.03)
mothur > classify.otu(list=pustular.good.unique.good.filter.unique.precluster.pick.an.unique_list.list, count=pustular.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=pustular.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, label=0.03)

Thanks in advance for helping!

Sorry, I’m not sure why they aren’t classifying well for you. A couple possibilities:

  1. The region you sequenced does a poor job of resolving the cyanobacteria due to a lack of signal or just being too short
  2. The database poorly represents the Cyanobacteria. The RDP training set you are using has 175 cyanobacterial sequences and 96 of those are chloroplasts. You might try using the greengenes taxonomy as it may have a better representation of the Cyanobacteria (http://www.mothur.org/wiki/Greengenes-formatted_databases).

Hope this helps some…
Pat

Hi Pat,

I had tried using the greengens, RDP and SilvaBacteria database(degaped).

But all the cyanobacteria are still not classified like below:

Bacteria(100);Cyanobacteria_Chloroplast(40);Cyanobacteria(37);Family_X(22);GpX(22);unclassified;
Bacteria(100);Cyanobacteria_Chloroplast(99);Cyanobacteria(99);Family_IV(54);GpIV(54);unclassified;

But it still doesn’t work. I reckon there maybe some problems in the classifying database as when I ran the remove.lineage command

mothur > remove.lineage(fasta=pustular.good.unique.good.filter.unique.precluster.pick.fasta, count=pustular.good.unique.good.filter.unique.precluster.uchime.pick.count_table, taxonomy=pustular.good.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy, taxon=Chloroplast-Mitochondria-unknown-Archaea-Eukaryota),

all cyanobacteria were deleted. Other bacteria worked well so maybe it’s just the problem of the classifying criteria of cyanobacteria?

Sorry if my question is unclear.

Cheers,

Alan

I just tried modifying the align.seq command

That’s what I put in:

align.seqs(fasta=smooth.good.unique.fasta, reference=silva.bacteria.fasta, flip=t)

I added flip=T so that Mothur also give a chance for the reverse compliments of the sequences that fall below the threshold.

After that I used GreenGenes database for classifying and I got most of the cyanobacteria classified! :smiley:

However, when I used RDP database for classifying, the same happened even if I set the aligning threshold to 99%. Below is the aligning command line

align.seqs(fasta=pustular.good.unique.fasta, reference=silva.bacteria.fasta, flip=T, threshold=0.99)

The cyanobacteria is still all unclassified using RDP. So I would like to ask is it the problem of RDP database?

Can you post a sequence that you think should be classified as a cyanobacterium and why you think it should be a cyanobacterium?

Here is the sequence:

HPIFY0O04I15GG Otu0007|119|pustular
ATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGAAACGATCCTAGCTTGCTAGGAGGCGTCGAGCGGCGGACGGGTGAGTAACGCGTAGAAATCTGCCCGGTAGTTGGGGATAGCCCGGAGAAATCCGGATTAATACCGAATAATCTCTACGGAGGAAAGGGGGCTTTGGCTCTCGCTACTGGATGAGTCTGCGTCGGATTAGCTTGTTGGTGAGGTAATGGCTTACCAAGGCGACGATCCGTA

With align.seqs(reference=silva, flip=T), using GreenGenes in classify.seqs
Otu0007 115 Root(100);k__Bacteria(100);p__Cyanobacteria(100);c__Oscillatoriophycideae(98);o__Chroococcales(98);f__Xenococcaceae(98);g__Gloeocapsopsis(98);s__crepidinum(98);

With align.seqs(reference=silva, flip=T, threshold=0.99), using RDP in classify.seqs
Otu0007 115 Bacteria(100);Cyanobacteria_Chloroplast(96);Cyanobacteria(96);Family_I(92);GpI(92);unclassified(92);


However, when I BLAST the sequence, I get either Saccharospirillum sp. HCh1 or uncultured bacterium.

Which makes it really weird

Sorry, but when I run that sequence against the RDP website and the trainset9 via classify.seqs in mothur I get this:

HPIFY0O04I15GG Bacteria(100);“Proteobacteria”(100);Gammaproteobacteria(100);Oceanospirillales(100);“Saccharospirillaceae”(95);Saccharospirillum(95);


Did you maybe post the wrong sequence?

Pat

Hi Pat,

Did you use the trainset9_032012 fasta and tax files for classify.seqs?

Yep.

But the problem is when I used trainset9_032012 fasta and tax files for classify.seqs as the reference and taxonomy for the classify.seqs, it gives me unclassified cyanobacteria after the classify.otu command.

Do you know what could possibly the reason behind?

Can you try again but use cutoff=80 in classify.seqs?