How to Lessen Unclassified Genera

Hello Mothur Community,

I am using the batch commands below with version 1.46.1. I get a lot of unclassified genera. What could I change in my commands to get less unclassified genera if possible?

image

set.logfile(name=Testing123LogFile)

fastq.info(fastq=Testing123.fastq)

get.current()

trim.seqs(fasta=current, oligos=Oligos.txt, maxambig=0, maxhomop=6, bdiffs=0, pdiffs=0, minlength=265, keepfirst=285, flip=F, processors=32)

summary.seqs(fasta=current, processors=32)

unique.seqs(fasta=current)

get.current()

summary.seqs(fasta=current, processors=32)

count.seqs(name=current, group=current)

get.current()

align.seqs(fasta=current, reference=silva.v4.fasta, processors=30)

summary.seqs(fasta=current, count=current, processors=32)

get.current()

screen.seqs(fasta=current, count=current, summary=current, optimize=start-end-minlength-maxlength, criteria=95, processors=32)

summary.seqs(fasta=current, count=current, processors=32)

get.current()

filter.seqs(fasta=current, vertical=T, trump=.)

get.current()

summary.seqs(fasta=current, count=current, processors=32)

unique.seqs(fasta=current, count=current)

summary.seqs(fasta=current, count=current, processors=32)

get.current()

pre.cluster(fasta=current, count=current, diffs=2)

summary.seqs(fasta=current, count=current, processors=32)

get.current()

chimera.vsearch(fasta=current, count=current, dereplicate=t)

remove.seqs(accnos=current, fasta=current)

get.current()

summary.seqs(fasta=current, count=current, processors=32)

classify.seqs(fasta=current, count=current, template=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80, processors=32)

get.current()

remove.lineage(fasta=current, count=current, taxonomy=current, taxon=Chloroplast-Mitochondria-unknown-Eukaryota)

summary.seqs(fasta=current, count=current, processors=32)

get.current()

summary.tax(taxonomy=current, count=current)

get.current()

rename.file(fasta=current, count=current, taxonomy=current, prefix=FinalCurationComplete)

split.abund(fasta=current, count=current, cutoff=1)

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComplete.abund.count_table)

get.current()

set.current(fasta=FinalCurationComplete.abund.fasta, count=FinalCurationComplete.abund.count_table)

get.current()

classify.seqs(fasta=current, count=current, reference=silva.nr_v138_1.align, taxonomy=silva.nr_v138_1.tax, cutoff=80 )

Hi there,

This is a complicated question. There are a number of factors that affect classification. In no particular order…

  • Region of the 16S rRNA gene
  • Length of the sequence to be classified
  • Number of sequences in a genus from the database
  • Quality of the annotation of the sequences in the database
  • The genetic diversity for that genus

There are probably other factors as well. The punch line is that even if you had full length sequences, it is possible you might not see much difference on the percentage of sequences that you can classify to a genus. Some genera are not well resolved by the 16S rRNA gene, some genera have very little representation in the database (e.g. 1 sequence), etc. Also, not every genus classifies equally well at the same region. Like I said, it’s complicated.

The one thing I would strongly advise against would be dropping the confidence score. This will give you more genus names, but your confidence in them will be less.

Pat

Hello!

I so understand. For example, I am working in the chicken caecum and ho boy, classification for Lachnospiraceae and Ruminococcaceae is so frustrating!

For me, the only to see which database to use or region or whatever you want to try to adjust is by using a positive control with known DNA in it. And even with that, I find these controls so limiting. Anyway, always use controls! Both neg (no DNA to make sure you or the sequencing center did not screwup) and pos (know community).

Best of sucess!