I have a question regarding the subsample command.
Here’s the version I am using:
Windows version
Running 64Bit Version
mothur v.1.32.1
Last updated: 1/6/2014
Here is the command I enter. If I understand it correctly, I am asking the program to subsample 1000 times from each group. I have 12 groups. I am sampling from the dist file created from my sequences.
mothur > sub.sample(shared=final.phylip.an.shared, size=1000, label=0.03)
Unable to open C:\mothur\in\final.phylip.an.shared. Trying output directory C:\mothur\out-SSURef\final.phylip.an.shared
Sampling 1000 from each group.
0.03
Output File Names:
C:\mothur\out-SSURef\final.phylip.an.0.03.subsample.shared
Here is what the output file looks like (partially):
label Group numOtus Otu0001 . . .
0.03 07 1221 386 . . .
0.03 08 1221 308 . . .
0.03 09 1221 317 . . .
0.03 16 1221 280 . . .
My question is, since I am asking the program to subsample my sequences 1000 times, how am I getting a total of 1221 OTUs for each group? Intuitively, I would expect a maximum total of 1000 OTUs for each group (if each read were completley unique to a separate OTU), but, in reality, a much lower numOTUs for each gorup. Discounting the 0’s, I get:
label Group numOtus Otu0001 . . .
0.03 07 181 386 . . .
0.03 08 177 308 . . .
0.03 09 187 317 . . .
0.03 16 145 280 . . .
The 1221 numOTUs seems to be the sum total of OTUs retrieved across all of the groups with some OTUs absent/present in each group. Those OTU’s with a “zero” in the column are being counted. Is it necessary to count the actual number of OTUs retrieved for each group for subsequent analyses or to port this data into other programs for analysis?
Thanks