When to normalize by number of sequences?

Hi dear Mothur community,

I have a question regarding the QC for pair-end MiSeq 18S rRNA - V4 Illumina sequencing.

I followed the SOP for MiSeq very successfully. Thanks! Is a great help.
Now I have a question about normalization of my data set.
Why do we cluster the OTUs before normalizing to, in my choice, number of sequences per sample? I have 15 samples between 70 and 120 K seqs.
Could somebody please explain this to me? I tend to think that is needed to have the sequences before clustering them, otherwise don’t we cluster sequences in OTUs that might end out of the subsample?

I was looking for a way to normalize my number of sequences before clustering, but I need a list file that only comes when we cluster.

Please help my reasoning,

Have a nice day,


I follow the sop and cluster all sequence then subsample/rarify/normalize seq number when calculating alpha and beta diversity and create one subsampled OTU table (but don’t calculate diversity off that table)