Any advantage in larger ksize and iters in classify.seqs?

Hi, I only want to do phylotype analysis, no OTU analysis is needed. Then:

  1. If I increase option ksize from 8 to 10, does this make the results of classify.seqs better than ksize=8?
  2. Since I have many reads (> 8 million unique reads per sample) and I prefer to wait longer for better results, if I increase iters to 300 and cutoff to 90, does these also make the classification more accurate? (I assume more reads will become ‘unknown’ but I have many reads.)
    Thanks a lot.
  1. when I run following command with ksize=10, two files missing ‘10’ is created in the same folder that Silva files are:
    classify.seqs(fasta=s_ecoli.trim.contigs.pick.fasta, reference=/db/bio/bmd/ngs/fan/ref_data/mothur_silva/silva.nr_v123.filter.fasta, taxonomy=/db/bio/bmd/ngs/fan/ref_data/mothur_silva/silva.nr_v123.tax, cutoff=90, ksize=10, processors=10)
    silva.nr_v123.silva.nr_v123.filter.:mer.numNonZero, silva.nr_v123.silva.nr_v123.filter.:mer.prob.
    They should be like “xxxx.10mer.xxxx”.

Why are you using ksize=10? I suspect we never foresaw anyone wanting to use such a kmer size. We normally find that 7 or 8 provide the best classification and that larger kmer sizes suck up more RAM and are considerably slower.

  1. If I run several classify.seqs jobs simultaneously, and if they all use the same Silva files, will the .prob and .numNonZeor files be overwritten since each classify.seqs job try to write its one files and the file names are the same?

That’s correct. You would have to run the commands in series or in separate folders.

Also, you seem to frequently post the same questions in multiple posts. Please stop doing this and only put up a single post with related questions. What you are doing makes it very hard for us to manage the forum.

Dear Pat:
Please accept my sincere apology if my posts seem improper to you. But I actually asked four different questions in two posts (one about usage and one about possible bug in two post groups) that are about classify.seqs.

  1. does ksize = 10 improves performance?
  2. does iters=300 and prob=90 improves performance
  3. a possible wrong file name as silva.nr_v123.silva.nr_v123.filter.:mer.prob
  4. does files get overwritten by multiple processes.
    I think Mothur is a great program and I want to use it in the best way. I can ask all questions in one post in the future.