Any advantage in larger ksize and iters in classify.seqs?

yangfa1 · March 28, 2016, 2:01pm

Hi, I only want to do phylotype analysis, no OTU analysis is needed. Then:

If I increase option ksize from 8 to 10, does this make the results of classify.seqs better than ksize=8?
Since I have many reads (> 8 million unique reads per sample) and I prefer to wait longer for better results, if I increase iters to 300 and cutoff to 90, does these also make the classification more accurate? (I assume more reads will become ‘unknown’ but I have many reads.)
Thanks a lot.

pschloss · March 29, 2016, 11:53am

when I run following command with ksize=10, two files missing ‘10’ is created in the same folder that Silva files are:
classify.seqs(fasta=s_ecoli.trim.contigs.pick.fasta, reference=/db/bio/bmd/ngs/fan/ref_data/mothur_silva/silva.nr_v123.filter.fasta, taxonomy=/db/bio/bmd/ngs/fan/ref_data/mothur_silva/silva.nr_v123.tax, cutoff=90, ksize=10, processors=10)
silva.nr_v123.silva.nr_v123.filter.:mer.numNonZero, silva.nr_v123.silva.nr_v123.filter.:mer.prob.
They should be like “xxxx.10mer.xxxx”.

Why are you using ksize=10? I suspect we never foresaw anyone wanting to use such a kmer size. We normally find that 7 or 8 provide the best classification and that larger kmer sizes suck up more RAM and are considerably slower.

If I run several classify.seqs jobs simultaneously, and if they all use the same Silva files, will the .prob and .numNonZeor files be overwritten since each classify.seqs job try to write its one files and the file names are the same?

That’s correct. You would have to run the commands in series or in separate folders.

Also, you seem to frequently post the same questions in multiple posts. Please stop doing this and only put up a single post with related questions. What you are doing makes it very hard for us to manage the forum.

yangfa1 · March 29, 2016, 12:42pm

Dear Pat:
Please accept my sincere apology if my posts seem improper to you. But I actually asked four different questions in two posts (one about usage and one about possible bug in two post groups) that are about classify.seqs.

does ksize = 10 improves performance?
does iters=300 and prob=90 improves performance
a possible wrong file name as silva.nr_v123.silva.nr_v123.filter.:mer.prob
does files get overwritten by multiple processes.
I think Mothur is a great program and I want to use it in the best way. I can ask all questions in one post in the future.

Topic		Replies	Views
classify.seqs problem mothur bugs	1	1838	March 29, 2016
Database generation speed and ksize Theory behind mothur	3	3181	December 3, 2015
classify.seqs using UNITE fungal db Commands in mothur	4	6338	December 19, 2012
Error with classify.seqs: XXX could not be classified/is bad/it has no kmers of lenght 8 Commands in mothur	4	900	December 30, 2022
Use existing search database for classify.seqs with knn Feature requests	2	3401	November 18, 2015

Any advantage in larger ksize and iters in classify.seqs?

Related topics