Mothur training set size vs rdp database

bob_loblaw · July 30, 2015, 8:58am

Hi,

I was curious how the mothur rdp training set is generated? It contains about 10k sequences vs over 3 million in the RDP and I was wondering how one leads to another? I’ve done some googling, looked through the mothur wiki and paper, maybe I missed it, but I can’t seem to find the answer.

Thanks

pschloss · August 3, 2015, 2:42pm

Hi there,

Here’s how the trainset is formatted for mothur:

http://blog.mothur.org/2015/05/27/RDP-v14-reference_files/

Keep in mind that the training set is supposed to be manually curated so that it is correct. In my understanding, they then run the rest of the sequences through the classifier to get their classifications.

Pat

Topic		Replies	Views
RDP Training Set Update Feature requests	15	18056	November 20, 2015
Impact of training sets on classification of high-throughput Journal club	5	14128	August 14, 2015
Trainset Necessary? Commands in mothur	3	788	September 4, 2020
Which reference to use for classify.seqs? Commands in mothur	2	1362	March 14, 2016
classify.seqs using the "trainset14_032015.pds.tax" as a reference file Commands in mothur	2	1370	July 19, 2016

Mothur training set size vs rdp database

Related topics