Random iterative subsampling

cgimpel · May 16, 2017, 1:56am

Hello dear Mothur community

I am wondering if there is a way to generate an random subsample of my set. Ideally a process with multiple iterations.

I have been using the MiSop guidelines on 18S/V4 iTags with the Silva 123 database.
I would like to subsample before using cluster.split() (with optiClust)

The subsample command allows to set the desire ‘size’ of the new randomly generated group, but I would like to also have iterations of the process.

Anybody with experience, suggestions or comments on how to get this done?

Thanks in advance,

Carla

pschloss · May 18, 2017, 12:15pm

If you have to subsample before clustering, you’re going to have to script it yourself. This is something that we don’t recommend.

Pat

cgimpel · June 7, 2017, 11:17pm

Hi again, thanks for the answer Pat.
So, basically the way to get around a computational demanding clustering step (~12 million unique reads from 318 samples) would be to use pre.cluster with a greater difference?

Thanks for your comments,

Carla

pschloss · June 8, 2017, 11:46am

Yes or to use phylotype. I think the problem is that you have used V3 chemistry and have a pretty high error rate.

Pat

cgimpel · June 15, 2017, 10:48pm

Hi Pat, and community
I am running my data again, and just noticed that after make.contigs() my scrap files (scrap.contigs.fasta and scrap.contigs.qual) are empty. Can anybody explain me a little better what the implications of this are?

Thanks!!

Carla

pschloss · June 19, 2017, 11:32am

That sounds right. Are you running screen.seqs after running make.contigs as described in the MiSeq SOP?

Pat

cgimpel · June 21, 2017, 2:54am

Oh! so is ok if they are empty … good =)

My workflow is based on the MiSOP, added a few commands to adjust to my data. Anyways the start looks like:

make.contigs(file=16S.files, processors=24)
summary.seqs(fasta=16S.trim.contigs.fasta)
pcr.seqs(fasta=16S.trim.contigs.fasta, oligos=bactV4.oligos, pdiffs=2, rdiffs=2, group=16S.contigs.groups)
summary.seqs(fasta=16S.trim.contigs.pcr.fasta)
screen.seqs(fasta=16S.trim.contigs.pcr.fasta, group=16S.contigs.pcr.groups, minlength=252, maxlength=254, maxambig=0, maxhomop=6)

Thanks!

Carla

Topic		Replies	Views
subsampling in MiSeq SOP Commands in mothur	3	4173	April 15, 2014
How can we sub.sample with higher iterations Commands in mothur	4	1190	July 3, 2017
Subsampling with Illumina data Commands in mothur	5	3032	July 29, 2014
Normalization Commands in mothur	1	4332	May 8, 2012
cluster.split Commands in mothur	13	8686	July 15, 2013

Random iterative subsampling

Related topics