mothur

importing dist matrix into R

I’m hoping some R wizard can help me. I’ve constructed distance matrices in mother (yeah for repeated subsampling capability) and now need to use them in R for further analyses. I can get the lower triangle data in but not the labels using which I modified from Unifrac Distance Matrix -> Newick using R

Code: Select all
b_bc <- data.matrix(read.table("", fill=T, row.names=1, skip=1, col.names=1:<#samples>))



I don't get any errors, just don't have the sample labels associated with the data. I've played with the row.names/col.names variable. And, thinking that the problem was related to the lower triangle data form, reran the dist in mothur to get a square matrix but still can't get the labels to be associated with the matrix. do I need to bring the sample names in as a vector then associate that vector with the matrix?

ETA: oh actually something’s not quite right when I load it into R, I get an N x N-1 matrix. Anyone?

ok should anyone else run into this. I shouldn’t have been using col.names. Used header=T instead and it worked beautifully

Hello,

I too have small query.
Like batch processing is it possible to call mothur command in R script?


Thanks

you can try, phyloseq::import_mothur_dist, if your dist generated from mothur, or if there some other methods?

I’ve been editing my mothur generated dist matrices in excel for the past couple of years but would really love to figure out how to get them into R straight from mothur. What I do in excel is copy the sample names, transpose paste them to be column headers, and add “NA” the the bottom right most cell. After these alterations I can import the dist file using:

tyc <- read.csv(file="weir.farm.tyc3.csv", header=T, row.names = 1)

How would I change my code so I dont’ have to do any alteration in excel?

Hey,

I use the following function in R to import mothur distance files:

parseDistanceMatrix = function(phylip_file) {

 # Read the first line of the phylip file to find out how many sequences/samples it contains
    temp_connection = file(phylip_file, 'r')
    len = readLines(temp_connection, n=1)
    len = as.numeric(len)
    close(temp_connection)
    
 
    phylip_data = read.table(phylip_file, fill=T, row.names=1, skip=1, col.names=1:len)
    phylip_matrix = as.dist(phylip_data)
    return(phylip_matrix)
}

Funnily enough, it’s actually based on your original R command in this thread.

Using this does with you a warning when it does the as.dist() cast, but I think this is just due to the trailing tab on the last line of the phylip. I’ve tested it extensively and it doesn’t change the data in anyway.

awesome! thanks. it’s the circle of code life :slight_smile:

I tweaked your code a bit because I like working with df rather than dist (I can remove samples with logic vectors when they’re df)

parseDistanceDF = function(phylip_file) {

   # Read the first line of the phylip file to find out how many sequences/samples it contains
    temp_connection = file(phylip_file, 'r')
    len = readLines(temp_connection, n=1)
    len = as.numeric(len)
    len = len +1
    close(temp_connection)
   
   
    phylip_data = read.table(phylip_file, fill=T, row.names=1, skip=1, col.names=1:len)
    colnames(phylip_data) <- row.names(phylip_data)
    return(phylip_data)
}