Pseudo-replication is underestimating # of different Otus?

So, I have species-location samples from various places. For each species, I sub-sampled fasta and groupfile per sample (i.e. to the lowest number of seqs). I did the sub-sampling 100 times, i.e. yielding 100 pseudo-replicates per species. I created shared-files for each pseudo-replicate and then created an R script aggregating Otus per location for each species… But now it hit me that Otu001 is not necessarily the same Otu in all pseudo-replicates, thus I might underestimate the number of different Otus per species-location, right? However, Im dealing with clone-lib seqs so the numbers aren’t huge so I would imagine this would have minor effects.

Maybe I’ve provided too little information, but I would appreciate some thoughts on this. Thanks!

If you’re doing your subsampling from a shared file, OTU1 is always OTU1. Since you don’t have a lot of reads, I’d take the whole thing through make.shared and then sub.sample.


Thanks for commenting,

But that’s the thing, I want to normalize at the sequence level since each species-location contain different numbers of reads. So, if I do the subsampling at OTU level some locations will for sure have more OTUs only for that reason…(?)

No, when you do it on the shared file, we randomly pick sequences from the OTUs, not randomly pick OTUs. It should effectively be the same thing.

Aah, ok! Makes sense. However, wanting to keep the groupfile in the loop, would it be the same subsampling on the listfile (thus being able to include the groupfile)? Cheers,

Mmm, not quite - we don’t actually assign the OTU “names” until the shared file is generated