classify.seqs: degap the V4 alignment file?

pdcountway · August 20, 2017, 3:01pm

Hi,

I’ve recreated the collection of the 190,661 Silva v128 reference sequences following the README here:http://blog.mothur.org/2017/03/22/SILVA-v128-reference-files/

At the very end of the README, it is suggested that sequences should be classified using silva.nr_v128.align and silva.nr_v128.tax after running pcr.seqs for the V4 region on the *.align file.

Per the MiSeq SOP tutorial, classify.seqs is run citing the following (smaller) reference and taxonomy files: reference=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax

My question…It seems like the ‘trainset’ reference file in the MiSeq SOP contains full-length, and degapped sequences, rather than sequences that have been pcr.seq’d and aligned per the README protocol. I understand that running pcr.seqs on the 190,661 sequences will greatly reduce the computational time, but is it also necessary to degap these 190,661 V4 reference sequences prior to running classify.seqs? Maybe it doesn’t matter?

Thanks,
Pete

pschloss · August 21, 2017, 12:16pm

You would need to align the RDP training to the SILVA alignment to extract the V4 region like we do for the alignment in the MiSeq SOP. It doesn’t really matter whether you degap the sequences prior to classify.seqs since mothur will do that for you if you don’t do it.

Pat

Topic		Replies	Views
Can I use"silva.seed_v119.align" in "classify.seqs"? Commands in mothur	2	1978	January 3, 2015
Classify.seqs() marking most sequences as unclassified mothur bugs	3	75	August 7, 2025
Using different reference for each align.seqs and classify.seqs Commands in mothur	2	446	October 31, 2022
Using Silva v119 in align.seqs and classify.seqs Commands in mothur	2	3811	August 25, 2014
Classify.seqs Commands in mothur	1	766	June 5, 2017

classify.seqs: degap the V4 alignment file?

Related topics