Issues running dist.shared with CLR-transformed data

Hi,

I am interested in generating a distance matrix file using CLR-transformed data so that it better satisfies the linearity assumptions for plotting the data using PCoA. I can generate the file without issue using make.clr, but to run dist.shared I need a .shared file as input. I can change the .clr file to a .shared file using write.table in R, but when I input this file into dist.shared and try and run it, Mothur just closes. I thought it may have been to do with computing power, so tried running this on our cloud-based super computer and got the same result. Any suggestions on what is happening?

Cheers,

Matt

Hmmm. I’m seeing this too with our MiSeq SOP data. Let us look into what’s going on.

  1. The output clr file is a shared file and should not need to be processed in R. What are you doing in R? I wonder if you have stray quote marks, row names, etc.
  2. What distance are you trying to use with it? FWIW, it should be the euclidean - dist.shared(shared=myshared.clr.shared, calc=euclidean).

This function doesn’t get much/any attention from us since we don’t use it. As I’ve shown in some recent papers, I think the use of CLR-type methods is pretty questionable…

https://journals.asm.org/doi/10.1128/msphere.00354-23
https://journals.asm.org/doi/10.1128/msphere.00355-23

Thanks,
Pat

Hey Pat,

Thanks for the quick reply. The output file I get from make.clr is a .clr file. I’ve tried using that as input to dist.shared but it doesn’t like it. I even tried replacing shared= with clr= and that didn’t work either. In R my script is:

write.table(clr_data, “clr_data.shared”, sep = “\t”, row.names = FALSE, col.names = TRUE, quote = FALSE)

The file content looks identical to the .clr file, just with a .shared tag.

Distance-wise I was trying to use Bray-Curtis; is this where i’m going wrong? So you don’t recommend CLR-transformation for beta-diversity metrics at all then Not even for differential abundance analyses? I’ll have a look at those paper soon.

Cheers,

Matt

Hey again,

Sorry the problem is on our end. We hope to have a fix up for you by the end of the week if not sooner. CLR really only works with a Euclidean distance - this is called an Aitchison Distance, which is the form that is used in the microbiome literature when people are concerned about compositionality.

I discourage all use of these types of metrics for all uses since they are sensitive to uneven sampling effort as shown in those papers.

Pat

Thanks for this Pat!

Regarding your papers, I completely agree about rarefaction being the only method of choice for accounting for uneven sequencing depth among samples. What I am questioning though, is whether a CLR transformation is of use on sub-sampled data to reduce data skewness, better handle zeros, and account for the closure effect in microbial datasets? This would appear to help bring the data back within the requirements of euclidean geometry for many of the statistical tests used in beta diversity analyses.

Cheers,

Matt

I think it would just turn zeroes into another number that doesn’t have any basis in biology. Many of these methods are lifted from gene expression analysis where we know the gene is present, but not being expressed. In communities, everything is not everywhere and treating them like they are is a problem. Zeroes in our case could mean that they’re below the limit of detection, but there or it could mean they aren’t there. I’m uneasy with assuming they are there like these methods do.

FWIW - an updated version of mothur is now available with the fixes for clr that will work with euclidean distances in dist.sahred

Thanks,
Pat