R function to convert mothur .count file to UniFrac ID

Following my post from May 30th regarding a function that would make a UniFrac compatible sample ID mapping file, here’s a solution using R.

In mothur, run something like:

count.seqs(name=final.names, group=final.groups)

Then in R:

Counts2ID <- function(data, filename = "sample", convert = FALSE) {
  require(reshape2)
  sample.ID <- subset(data, select = -c(total)) # remove column 'total'
  sample.ID <- melt(sample.ID, id = "Representative_Sequence") # melt data frame
  names(sample.ID) <- c("Sequence.ID", "Sample.ID", "Sequence.abundance") # rename variables
  sample.ID <- sample.ID[sample.ID$Sequence.abundance > 0, ] # remove 0 values
  sample.ID <- sample.ID[order(sample.ID$Sequence.ID), ]
  write.table(sample.ID, file = paste(filename,".ID",sep = ""), quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)
  return()
}

filename <- "final.seq.count"
data <- read.table(filename, header=T) # read file
Counts2ID(data, filename)

Hope someone finds this useful.
Roey

Hello - This seems like a really useful R-script but I’m having trouble running it. I created a count table using: count.seqs(name=am_final.names, group=am_final.groups), which is called “am_final.count_table”, in which the first column is called Representative Sequence, and the 2nd is called total, then there are 20 sample columns with sample ID headers.

I tried to run the rest of your script in R. The first part in the brackets appears to run, but doesn’t create a new file.
Counts2ID <- function(data, am_final.count_table = “sample”, convert = FALSE) {

  • require(reshape2)
    
  • sample.ID <- subset(data, select = -c(total)) # remove column 'total'
    
  • sample.ID <- melt(sample.ID, id = "Representative_Sequence") # melt data frame
    
  • names(sample.ID) <- c("Sequence.ID", "Sample.ID", "Sequence.abundance") # rename variables
    
  • sample.ID <- sample.ID[sample.ID$Sequence.abundance > 0, ] # remove 0 values
    
  • sample.ID <- sample.ID[order(sample.ID$Sequence.ID), ]
    
  • write.table(sample.ID, file = paste(new_table,".ID",sep = ""), quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)
    
  • return()
    
  • }

The remaining scripts do not run - this might be something simple like I’m putting the filenames in the wrong place? Let me know if you can help. Thanks!