Shortest way to classify sequences

allan_santos · March 31, 2020, 4:09am

Hey there,
I’ve conclude all steps following tips on SOP however I’d like have a look for cyanobacteira phyla. So, there is a shortest way to classify my sequences with another database like GG without repeat some steps already done?

thanks a lot

amwalkero0o · March 31, 2020, 5:29pm

What reference database did you use for the classify.seqs step in the SOP?

allan_santos · March 31, 2020, 5:35pm

Firstly it was from RDP reference and cluster.split in two steps worked normally.
however by now I need to get a classification along GG database but after that the cluster.split command isn’t working anymore

amwalkero0o · March 31, 2020, 6:06pm

You should use the same reference database files for all steps, i.e. alignment, classifying, and cluster split. It is likely the cluster.split command isn’t working because you didn’t use the GG reference database files during your alignment and classify.seqs steps and the cluster.split command uses taxonomy to split sequences into bins and then cluster. You should have cyanobacteria identified when you used RDP to classify. Given that, why do you think that cyanobacteria are not classified in your dataset?

My understanding is that GG is not well curated and probably shouldn’t be used. Personally, I like to use the Silva reference files throughout the mothur SOP for alignment and classification with my marine sediment samples. If you were to use Silva or if you still want to use GG you’d need to go back to the alignment step, re-run it, and repeat the steps after. Here are how a few of those commands would look with the Silva reference files.

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.nr_v138.pcr.align, flip=T, processors=8)

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=8)

allan_santos · March 31, 2020, 7:14pm

Previously, I used Silva for alignment and RDP to classify in 2 different steps of cluster.split and ok, it worked very well and I got all files.
However, they didn’t show me any match for Cyanobacteria phyls which is a quite strange because my samples were taken from a lagoon with cyanobacterial bloom.
So, I thought as an alternative to get a classify seqs from another database like GG using the same files from Silva alignment but cluster.split didn’t worked at this time I don’t know why.

amwalkero0o · March 31, 2020, 7:21pm

I strongly suggest you use Silva files only, not RDP with Silva. You should use the same reference files from your alignment step onward, as indicated in the example I gave above with Silva. I have had issues with RDP classification using marine samples, so that could be your issue as well. Silva taxonomy definitely has cyanobacteria.

allan_santos · March 31, 2020, 9:32pm

How can I get silva.fasta file to perform classify.seqs?
because I’ve got just .align and .tax from version 138.
I’ve double check on another thread regarding that but I didn’t understand very well how to get it.

thanks

amwalkero0o · March 31, 2020, 9:34pm

You use the align file for the reference (reference=silva.nr_v138.pcr.align) and the tax file for taxonomy (taxonomy=silva.nr_v138.tax)

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=8)

allan_santos · March 31, 2020, 9:55pm

Ive performed that however it getting so many error message (like sentences below) and didn’t generated any file …

''Using 48 processors.
Generating search database… DONE.
It took 267 seconds generate search database.

Reading in the silva.nr_v138.tax taxonomy… DONE.
AAAA02038450.OrySati3 is in your taxonomy file and is not in your template file. Please correct.
AAAA02039541.OrySativ is in your taxonomy file and is not in your template file. Please correct.
AAAA02041579.OrySat38 is in your taxonomy file and is not in your template file. Please correct.
AAAA02046270.OrySati2 is in your taxonomy file and is not in your template file. Please correct.
AAAK03000004.GBKFa194 is in your taxonomy file and is not in your template file. Please correct.
AABL01000525.PlsYoel4 is in your taxonomy file and is not in your template file. Please correct.

…’’

amwalkero0o · March 31, 2020, 10:26pm

Can you share your commands for your align.seqs and classify.seqs?

allan_santos · March 31, 2020, 10:52pm

align.seqs(fasta=samples.good.unique.fasta, reference=silva.nr_v138.pcr.align)
…
classify.seqs(fasta=samples.precluster.pick.fasta, count=samples.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=48)

Using 48 processors.
Generating search database… DONE.
It took 267 seconds generate search database.

Reading in the silva.nr_v138.tax taxonomy… DONE.
AAAA02038450.OrySati3 is in your taxonomy file and is not in your template file. Please correct.
AAAA02039541.OrySativ is in your taxonomy file and is not in your template file. Please correct.
AAAA02041579.OrySat38 is in your taxonomy file and is not in your template file. Please correct.
…

amwalkero0o · March 31, 2020, 11:39pm

Thanks! Can I also see your pcr.seqs code as well as what your oligos file looks like?

allan_santos · April 1, 2020, 3:36am

pcr.seqs(fasta=silva.n_v138.pcr.align, start=13862, end=23444)

oligos=primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT v4

amwalkero0o · April 1, 2020, 10:16pm

Those all look right to me. I am thinking there must be something that went wrong with the other commands in between. I use the same primers and the silva database and haven’t had the same issue. The only thing I can recommend is to start from the beginning and go slow, checking each step and summary closely. Or maybe you can get Pat Schloss or Sarah Westcott’s attention and they can take a further look.

I found that this question has been addressed before here but it looks like the person who posted it found their mistake and got it to work.

allan_santos · April 1, 2020, 10:59pm

Right, thanks
Did you get classify.seqs from silva.align?

amwalkero0o · April 2, 2020, 1:00am

Yes, I used only Silva files throughout the workflow.

Topic		Replies	Views
Unclassified Cyanobacteria Theory behind mothur	10	11110	August 5, 2014
Can I use"silva.seed_v119.align" in "classify.seqs"? Commands in mothur	2	1961	January 3, 2015
Finding .taxonomy files for Classify.seqs? Commands in mothur	10	45913	September 4, 2013
Classify seqs Theory behind mothur	3	5163	September 10, 2014
classify.seq Commands in mothur	3	2693	July 15, 2014

Shortest way to classify sequences

Related topics