sub.sample behavior between versions

Kendra · August 16, 2016, 3:03pm

We’re subsampling a practice dataset after make.contigs (just to reduce comp time as people are learning). Some people have a 1.38.1 and on 1.38.0. After screen.seqs (maxambig=0, maxlength=275) and unique.seqs, the 2 with 1.38.1 have exactly the same number of unique and total seqs. The person with 1.38.0 has a different but close number. This implies that the versions are sub.sampling differently?

westcott · August 16, 2016, 4:44pm

Could you explain further? The sub.sample command randomly selects sequences. Given this variability one could expect to see differences in the results.

Kendra · August 16, 2016, 5:06pm

Yeah I thought subsample was random. I can’t figure out how 2 people got what appears to be the exact same subsample. Here’s what we did.

make.contigs(file=stability.txt)
sub.sample(fasta=current, groups=current, size=20000, persample=T)

total seqs # 240000

screen.seqs(fasta=current, group=current, maxambig=0, maxlength=275)
unique.seqs(fasta=current)

One person (running 1.38.0) had
unique seqs 77289
total # of seqs: 187529

But both the people running 1.38.1 got
unique #77317
Total 187795
Which looks like they subsampled the exact same sequences to end up with the same # of uniques and total?

Topic		Replies	Views
Sub.sample in mothur Commands in mothur	4	17	December 22, 2025
sub.sample with fasta, name & group or fasta and count file Commands in mothur	4	3875	October 31, 2014
uneven Sub.sample-ing Commands in mothur	6	1253	March 7, 2017
sub.sample question Commands in mothur	1	1953	February 25, 2014
sub.sample feature Commands in mothur	2	3930	January 10, 2011

sub.sample behavior between versions

Related topics