What is the meaning of increasing numbers from classify.seqs?

Hi, I am running classify.seqs using Ezbiocloud edited v3v4 reference sequences, but classifying sequences took a very long time (currently running more than 24 hours) and the numbers showing are still increasing. May I know what does this mean? And is there a way to predict the duration? If my laptop went to sleep does the process get hindered? Thank you.

Hi,

How many unique sequences do you have? If you’re sequencing V3-V4 or aren’t following our MiSeqSOP, you might want to check out this post…

Pat

There are 194520 sequences classified from classify.seqs command. Im using Macbook Air M1 chip, but couldn’t find any details of comparison on RAM or so for mothur analysis. I did follow the SOP as much as I can understand :grin: :grin: and added large=T as well as putting taxlevel=3. May I know what are the use of those different dist files produced?

That’s a lot of uniques. What region are you sequencing? I suspect you have so many because of low sequence quality from sequencing V3-V4 or V4-V5 (i.e. something longer than ~250 nt). I certainly wouldn’t try to process this on a laptop. The large=T option should not be used and you probably want to use taxlevel=5 or 6 and possibly increase the diffs value in pre.cluster to the integer that is just less than your sequence length divided by 100,

Pat

Yes its V3-V4 region.Thousand thanks for all great suggestions, everything worked fine after increasing diffs to 5, and changed taxlevel to 6. I also changed the reference database to silvaseed instead of the larger silva reference sequences, may I know if size of the reference database affects?

the seed version should be good enough for most purposes

pat

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.