sub.sample with fasta, name & group or fasta and count file

AstridN · October 22, 2014, 10:47am

Hi all,

I have analyzed a illumina MiSeq dataset of 16S V4 following the MiSeq SOP.

In order to be able to prepare a fasta file as required for oligotyping (http://oligotyping.org/2012/05/11/oligotyping-pipeline-explained/#Preparing_the_FASTA_File), I restarted the whole SOP by using name and group file instead of the count file. I solved the problem generating the fasta file for oligotyping, but came across another thing:

When running the MiSeq SOP with a count file, I am getting 261,137 uniques and 4,059,903 sequences. After cluster.split at taxlevel=5, I end up with 33,036 OTUs. By sub.sample they are reduced to 26,255 OTUs, 185,224 uniques and 2,371,440 sequences.

When running the MiSeq SOP with a name and group file, I am getting 257,435 uniques and 3,917,871 sequences. After cluster.split at taxlevel=5, I end up with 33,272 OTUs. But by using sub.sample they are reduced to only 7,732 OTUs, 38,802 uniques and 2,286,384 sequences.

I am not too much worried about the slightly deviating numbers before sub.sample, but completely cluelesss why I am getting this massive difference in OTU number and unique sequences by using sub.sample.

Any ideas?

pschloss · October 24, 2014, 7:25pm

By subsampling you are removing data and so you should expect to have fewer OTUs.

AstridN · October 25, 2014, 2:59pm

Hi Pat,

thanks for your quick reply.

Reducing my data set by sub.sampling is what I want and I know that it reduces the number of OTUs.
But I am confused about the very different outcomes of sub.sample when using exactly the same data set employing the MiSeq SOP once with a count file and once with a name and group file: after sub.sample I am getting more than 26,000 OTUs (count file) or less than 8,000 OTUs (name and group file)! Before sub.sample both data sets have about the same number of sequences (4,000,000), uniques (260,000) and number of OTUs (33,000).

pschloss · October 28, 2014, 3:58pm

Can you post the fasta, group, count, and names files somewhere for me to download and look at? It’d also be good to have the exact commands you are running.

Thanks,
Pat

westcott · October 31, 2014, 1:10pm

Hi,
I ran the following command on the final files from Pat’s 454 example, http://www.mothur.org/wiki/454_SOP.

make.table(group=final.groups, name=final.names)
sub.sample(fasta=final.fasta, count=current, size=4419, persample=t)
dist.seqs(fasta=current)
cluster(count=current)

and

sub.sample(fasta=final.fasta, group=final.groups, name=final.names, size=4419, persample=t)
dist.seqs(fasta=current)
cluster(name=current)

The resulting list files had a comparable number of OTUs. Could you have left the persample off of one of the sub.sample commands?

Kindly,
Sarah

Topic		Replies	Views
sub.sample before OTU clustering? Commands in mothur	2	2497	October 21, 2014
Generate fasta file from sub.sample shared file Commands in mothur	2	1365	July 3, 2018
Subsampling with Illumina data Commands in mothur	5	3032	July 29, 2014
subsampling in MiSeq SOP Commands in mothur	3	4172	April 15, 2014
sub.sample() taxonomy error message "read missing" mothur bugs	5	3548	March 23, 2015

sub.sample with fasta, name & group or fasta and count file

Related topics