Shortest way to classify sequences

Hey there,
I’ve conclude all steps following tips on SOP however I’d like have a look for cyanobacteira phyla. So, there is a shortest way to classify my sequences with another database like GG without repeat some steps already done?

thanks a lot

What reference database did you use for the classify.seqs step in the SOP?

Firstly it was from RDP reference and cluster.split in two steps worked normally.
however by now I need to get a classification along GG database but after that the cluster.split command isn’t working anymore

You should use the same reference database files for all steps, i.e. alignment, classifying, and cluster split. It is likely the cluster.split command isn’t working because you didn’t use the GG reference database files during your alignment and classify.seqs steps and the cluster.split command uses taxonomy to split sequences into bins and then cluster. You should have cyanobacteria identified when you used RDP to classify. Given that, why do you think that cyanobacteria are not classified in your dataset?

My understanding is that GG is not well curated and probably shouldn’t be used. Personally, I like to use the Silva reference files throughout the mothur SOP for alignment and classification with my marine sediment samples. If you were to use Silva or if you still want to use GG you’d need to go back to the alignment step, re-run it, and repeat the steps after. Here are how a few of those commands would look with the Silva reference files.

align.seqs(fasta=stability.trim.contigs.good.unique.fasta, reference=silva.nr_v138.pcr.align, flip=T, processors=8)

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=8)

Previously, I used Silva for alignment and RDP to classify in 2 different steps of cluster.split and ok, it worked very well and I got all files.
However, they didn’t show me any match for Cyanobacteria phyls which is a quite strange because my samples were taken from a lagoon with cyanobacterial bloom.
So, I thought as an alternative to get a classify seqs from another database like GG using the same files from Silva alignment but cluster.split didn’t worked at this time I don’t know why.

I strongly suggest you use Silva files only, not RDP with Silva. You should use the same reference files from your alignment step onward, as indicated in the example I gave above with Silva. I have had issues with RDP classification using marine samples, so that could be your issue as well. Silva taxonomy definitely has cyanobacteria.

How can I get silva.fasta file to perform classify.seqs?
because I’ve got just .align and .tax from version 138.
I’ve double check on another thread regarding that but I didn’t understand very well how to get it.

thanks

You use the align file for the reference (reference=silva.nr_v138.pcr.align) and the tax file for taxonomy (taxonomy=silva.nr_v138.tax)

classify.seqs(fasta=stability.trim.contigs.good.unique.good.filter.unique.precluster.pick.fasta, count=stability.trim.contigs.good.unique.good.filter.unique.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=8)

Ive performed that however it getting so many error message (like sentences below) and didn’t generated any file …

''Using 48 processors.
Generating search database… DONE.
It took 267 seconds generate search database.

Reading in the silva.nr_v138.tax taxonomy… DONE.
AAAA02038450.OrySati3 is in your taxonomy file and is not in your template file. Please correct.
AAAA02039541.OrySativ is in your taxonomy file and is not in your template file. Please correct.
AAAA02041579.OrySat38 is in your taxonomy file and is not in your template file. Please correct.
AAAA02046270.OrySati2 is in your taxonomy file and is not in your template file. Please correct.
AAAK03000004.GBKFa194 is in your taxonomy file and is not in your template file. Please correct.
AABL01000525.PlsYoel4 is in your taxonomy file and is not in your template file. Please correct.

…’’

Can you share your commands for your align.seqs and classify.seqs?

align.seqs(fasta=samples.good.unique.fasta, reference=silva.nr_v138.pcr.align)

classify.seqs(fasta=samples.precluster.pick.fasta, count=samples.precluster.denovo.vsearch.pick.count_table, reference=silva.nr_v138.pcr.align, taxonomy=silva.nr_v138.tax, cutoff=80, processors=48)

Using 48 processors.
Generating search database… DONE.
It took 267 seconds generate search database.

Reading in the silva.nr_v138.tax taxonomy… DONE.
AAAA02038450.OrySati3 is in your taxonomy file and is not in your template file. Please correct.
AAAA02039541.OrySativ is in your taxonomy file and is not in your template file. Please correct.
AAAA02041579.OrySat38 is in your taxonomy file and is not in your template file. Please correct.

Thanks! Can I also see your pcr.seqs code as well as what your oligos file looks like?

pcr.seqs(fasta=silva.n_v138.pcr.align, start=13862, end=23444)

oligos=primer GTGCCAGCMGCCGCGGTAA GGACTACHVGGGTWTCTAAT v4

Those all look right to me. I am thinking there must be something that went wrong with the other commands in between. I use the same primers and the silva database and haven’t had the same issue. The only thing I can recommend is to start from the beginning and go slow, checking each step and summary closely. Or maybe you can get Pat Schloss or Sarah Westcott’s attention and they can take a further look.

I found that this question has been addressed before here but it looks like the person who posted it found their mistake and got it to work.

Right, thanks
Did you get classify.seqs from silva.align?

Yes, I used only Silva files throughout the workflow.