importing dist matrix into R

Kendra · March 21, 2013, 8:25pm

I’m hoping some R wizard can help me. I’ve constructed distance matrices in mother (yeah for repeated subsampling capability) and now need to use them in R for further analyses. I can get the lower triangle data in but not the labels using which I modified from Unifrac Distance Matrix -> Newick using R

Code: Select all
b_bc <- data.matrix(read.table("", fill=T, row.names=1, skip=1, col.names=1:<#samples>))

I don't get any errors, just don't have the sample labels associated with the data. I've played with the row.names/col.names variable. And, thinking that the problem was related to the lower triangle data form, reran the dist in mothur to get a square matrix but still can't get the labels to be associated with the matrix. do I need to bring the sample names in as a vector then associate that vector with the matrix?

ETA: oh actually something’s not quite right when I load it into R, I get an N x N-1 matrix. Anyone?

Kendra · April 8, 2013, 11:25pm

ok should anyone else run into this. I shouldn’t have been using col.names. Used header=T instead and it worked beautifully

Rich · June 19, 2013, 2:14pm

Hello,

I too have small query.
Like batch processing is it possible to call mothur command in R script?

Thanks

linls1912 · October 12, 2015, 12:51am

you can try, phyloseq::import_mothur_dist, if your dist generated from mothur, or if there some other methods?

Kendra · June 23, 2016, 7:07pm

I’ve been editing my mothur generated dist matrices in excel for the past couple of years but would really love to figure out how to get them into R straight from mothur. What I do in excel is copy the sample names, transpose paste them to be column headers, and add “NA” the the bottom right most cell. After these alterations I can import the dist file using:

tyc <- read.csv(file="weir.farm.tyc3.csv", header=T, row.names = 1)

How would I change my code so I dont’ have to do any alteration in excel?

dwaite · June 23, 2016, 10:00pm

Hey,

I use the following function in R to import mothur distance files:

parseDistanceMatrix = function(phylip_file) {

 # Read the first line of the phylip file to find out how many sequences/samples it contains
    temp_connection = file(phylip_file, 'r')
    len = readLines(temp_connection, n=1)
    len = as.numeric(len)
    close(temp_connection)
    
 
    phylip_data = read.table(phylip_file, fill=T, row.names=1, skip=1, col.names=1:len)
    phylip_matrix = as.dist(phylip_data)
    return(phylip_matrix)
}

Funnily enough, it’s actually based on your original R command in this thread.

Using this does with you a warning when it does the as.dist() cast, but I think this is just due to the trailing tab on the last line of the phylip. I’ve tested it extensively and it doesn’t change the data in anyway.

Kendra · June 24, 2016, 5:27pm

awesome! thanks. it’s the circle of code life

Kendra · August 3, 2016, 2:42pm

I tweaked your code a bit because I like working with df rather than dist (I can remove samples with logic vectors when they’re df)

parseDistanceDF = function(phylip_file) {

   # Read the first line of the phylip file to find out how many sequences/samples it contains
    temp_connection = file(phylip_file, 'r')
    len = readLines(temp_connection, n=1)
    len = as.numeric(len)
    len = len +1
    close(temp_connection)
   
   
    phylip_data = read.table(phylip_file, fill=T, row.names=1, skip=1, col.names=1:len)
    colnames(phylip_data) <- row.names(phylip_data)
    return(phylip_data)
}

Topic		Replies	Views
loading a dist matrix into R Commands in mothur	1	2071	March 21, 2013
Phylip formatted distance matrix question Commands in mothur	6	161258	January 9, 2010
Convert between distance matrix formats Commands in mothur	7	6969	December 5, 2013
read.dist not finding file mothur bugs	5	5179	September 22, 2010
Unifrac Distance Matrix -> Newick using R Commands in mothur	2	6902	May 22, 2011

importing dist matrix into R

Related topics