documentation for get.communitytype()

I find the idea of partitioning my dataset into separated community types very attractive.
Therefore I looked at this command and used it with default parameters.
I received several output files and struggle a little bit understanding the results.
Unfortunately, the documentation on this command is very… “succinct” :wink: .
Could anybody help in understanding the meaning of the headers in the different output files?
And maybe, how to use the different options for the other methods, kmeans or pam?
In advance, thanks for your help,
Bernard

The dmm method is based on this approach: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030126, developed by Chris Quince’s lab. If you read the paper it will tell you what the values actually mean, but for the cheat-sheet version of interpreting the results:

What you run the command mothur will print a series of statistics into the logfile, testing the fit of an increasing number of partitions. I think this data is also saved into a summary file, but I don’t recall off the top of my head. The number of partitions with the lowest Laplace value is the best fit. I believe the best Laplace value is worked out as a kind of local minimum solution, if a lower score is not obtained within three partition increments of the last lowest mothur will cancel the operation. For example, if you started and got values of [600, 400, 200, 300] then mothur would perform two more runs and if a score of lower than 200 wasn’t observed then the command ends.

Due to this, you get a lot of output files but only one set is ‘correct’ so you need to make sure you pick the right set of files to proceed. First, you’ll get a *dmm.mix.summary that contains the relative abundances of each OTU in each partition, which is probably the part you’ll be most interested in. mothur will also give you a design file that matches groups into partitions.

If you’d like to write the wiki page, that would be awesome… :slight_smile:

Thanks you very much for your help dwaite.
Best regards,
Bernard

I ran the command several times on the same dataset and obtained different results.
4 partitions, then 3, then 6, than 4 again…
Hm?
What does it mean?
How should I consider these results?

Run it a few times and see which k size gives you the smallest Laplace value

Example:

  1. I have three type file from QIIME: map file, tree file, and otu-biom file

  2. Then I transefer biom file to shared file in mothur (version: 1.36.1)

mothur > make.shared(biom=data\otu_table_json.biom)

otu_table_json.biom(2144kb) --------->otu_table_json.shared(762kb)

The size of these two files has a big difference, is there something wrong?

  1. Then I use the shared file to make community type

mothur > get.communitytype(shared=data\otu_table_json.shared)

4 results that I got:
K NLE logDet BIC AIC Laplace
1 200611.32 9053.11 219046.11 213367.32 193415.90
2 215645.42 -17883.78 252516.45 241158.42 183258.66
3 249310.65 -44243.09 304617.92 287580.65 192021.35
4 282383.69 -73397.50 356127.19 333410.69 198794.29
and

Partition_1 Partition_2
D-C7B28.fastq 1.0000 0.0000
D-C7B29.fastq 1.0000 0.0000
D-C7B30.fastq 1.0000 0.0000
D-C7B32.fastq 1.0000 0.0000
D-SPm1.fastq 0.0000 1.0000
D-SPm2.fastq 0.0000 1.0000
D-SPs.fastq 0.0000 1.0000
SPL-HT.fastq 0.0000 1.0000
SPL-LT.fastq 1.0000 0.0000
SPm-LT.fastq 0.0000 1.0000
SPr-HT.fastq 0.0000 1.0000
SPr-LT.fastq 0.0000 1.0000
SPs-LT.fastq 0.0000 1.0000
W-C7B20.fastq 1.0000 0.0000
W-C7B28.fastq 1.0000 0.0000
W-C7B29.fastq 1.0000 0.0000
W-C7B30.fastq 1.0000 0.0000
W-C7B32.fastq 1.0000 0.0000
is there anything wrong?
why all value is 1 or 0, no mix values such as
partition 1 partition2
0.8 0.2

For the file size - it’s probably nothing. biom files usually incorporate a lot of sample metadata and OTU taxonomy which doesn’t make it into your shared file so that’s probably where the size difference is coming from.

For your results, the formatting on the forums makes it a bit hard to organise, but here goes. So first, your summary results were:

K NLE        logDet   BIC       AIC       Laplace
1 200611.32  9053.11  219046.11 213367.32 193415.90
2 215645.42 -17883.78 252516.45 241158.42 183258.66
3 249310.65 -44243.09 304617.92 287580.65 192021.35
4 282383.69 -73397.50 356127.19 333410.69 198794.29

As you can see, the Laplace value is lowest at 2, so this is the ideal number of partitions identified. mothur only tested four values of k, which is consistent with my previous post - after a minimum solution was found at k=2, 2 more iterations were performed but a better score wasn’t observed.

For interpreting the other file, it’s really just a simple presence/absence of sample membership in each partition. For example:

                Partition_1  Partition_2
D-C7B28.fastq 1.0000     0.0000
D-C7B29.fastq 1.0000     0.0000
D-C7B30.fastq 1.0000     0.0000
D-C7B32.fastq 1.0000     0.0000
D-SPm1.fastq  0.0000     1.0000
D-SPm2.fastq  0.0000     1.0000
D-SPs.fastq   0.0000     1.0000

Shows that your first 4 samples belong to partition 1, and the next three are in partition 2. The design file that gets returned from the command should show this too, although I always double check it to be sure.

Hope that helps!

Thank you very much

  1. is there any published paper about community type analysis, I didn’t find yet. If you knows, could you please post a title here, then I can got it

  2. I saw some people, get a mixed community type in one sites

for example:
site D-C7B28: maybe 10% is partition1 and 90% partition2, I don’t know how they got, cause you mentioned that it is just presence/absence here
is there any thing i can change for the parameters to do this

thank you very much

have a good one

The original software was published here. I think you can also get that publication by running

get.communitytype(citation)

Although I’m not sure.

For determining mixtures, I’m not sure. There are a couple of other output files that get generated by the command so have a look through those and see if one matches what you described.