Query for extended hours in cluster.seqs command for 16S RNA sequence reads

Rukaiya · May 1, 2023, 6:13am

I am having trouble in clustering the sequences, i am using about 203000 forward read sequence in fasta files for analysis. however, after running distance sequence command after aligning against greengenes reference alignment, it came up to be 51 GB in data for the sample.dist file. and now the clustering process is running since 36 hours and still results are pending.

pschloss · May 2, 2023, 6:05pm

Hi,

First - I strongly recommend against using greengenes as a reference alignment because it does a horrible job in the variable regions. Use the silva reference alignment as described in the MiSeq SOP.

Second - I’m not sure what steps you are running or how you’re running them. Could you post more of your pipeline? What region are you sequencing and with what chemistry? I suspect that you might need to consult this blog post…

Thanks,
Pat

Topic		Replies	Views
"cluster.split"	3	204	February 3, 2023
Facing clustering issue Commands in mothur	10	330	September 19, 2022
Cluster command issue Commands in mothur	6	483	December 10, 2021
Cluster split command running for days Commands in mothur	3	380	July 1, 2022
Dist.seqs of 700 000 illumina sequences Commands in mothur	4	4459	March 31, 2013

Query for extended hours in cluster.seqs command for 16S RNA sequence reads

Related topics