Normalization

Hi

I have been using mothur.
I have realized that whether its RNASeq or 16sRNA data Normalization is important.
But when we do normalization with mothur I have some Queries

  1. why we set sample size to 1000?
  2. doesn’t it lose lot of data information?
  3. Is it worth losing the samples having size less than 1000?
  1. why we set sample size to 1000?

There’s no predefined number of sequences to normalize or subsample to. It all depends on how many samples you have and what samples you’ll lose if they have fewer sequences than your threshold.

  1. doesn’t it lose lot of data information?

Depends on how you look at it. You either lose data or divine it out of thin air. I prefer not to make up data :slight_smile:

  1. Is it worth losing the samples having size less than 1000?

Again, this will depend on your question and how important those samples are. For example, if your controls have less than your threshold you either need to relax the threshold or do some more sequencing.

Thanks for the reply
I got your point but if some samples has size less that 200 then is it worth to keep them? even if those samples are important. Will these samples provide sufficient information?


Cheers,

hello,
Can the FLX data be used directly without doing standard normalization? What would be the consequences?
Because I’m still confused with this normalization. Pool down the sample which have good sequencing depth to the lowest one. Is it Justified?

Help me in clearing my doubts on this Normalization Issue.


Cheers,

Can the FLX data be used directly without doing standard normalization? What would be the consequences?

You can do anything, but if I were the reviewer, I would ding you for not normalizing or subsampling.