Following my post from May 30th regarding a function that would make a UniFrac compatible sample ID mapping file, here’s a solution using R.
In mothur, run something like:
count.seqs(name=final.names, group=final.groups)
Then in R:
Counts2ID <- function(data, filename = "sample", convert = FALSE) {
require(reshape2)
sample.ID <- subset(data, select = -c(total)) # remove column 'total'
sample.ID <- melt(sample.ID, id = "Representative_Sequence") # melt data frame
names(sample.ID) <- c("Sequence.ID", "Sample.ID", "Sequence.abundance") # rename variables
sample.ID <- sample.ID[sample.ID$Sequence.abundance > 0, ] # remove 0 values
sample.ID <- sample.ID[order(sample.ID$Sequence.ID), ]
write.table(sample.ID, file = paste(filename,".ID",sep = ""), quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)
return()
}
filename <- "final.seq.count"
data <- read.table(filename, header=T) # read file
Counts2ID(data, filename)
Hope someone finds this useful.
Roey
Hello - This seems like a really useful R-script but I’m having trouble running it. I created a count table using: count.seqs(name=am_final.names, group=am_final.groups), which is called “am_final.count_table”, in which the first column is called Representative Sequence, and the 2nd is called total, then there are 20 sample columns with sample ID headers.
I tried to run the rest of your script in R. The first part in the brackets appears to run, but doesn’t create a new file.
Counts2ID <- function(data, am_final.count_table = “sample”, convert = FALSE) {
-
require(reshape2)
-
sample.ID <- subset(data, select = -c(total)) # remove column 'total'
-
sample.ID <- melt(sample.ID, id = "Representative_Sequence") # melt data frame
-
names(sample.ID) <- c("Sequence.ID", "Sample.ID", "Sequence.abundance") # rename variables
-
sample.ID <- sample.ID[sample.ID$Sequence.abundance > 0, ] # remove 0 values
-
sample.ID <- sample.ID[order(sample.ID$Sequence.ID), ]
-
write.table(sample.ID, file = paste(new_table,".ID",sep = ""), quote = FALSE, sep = "\t", col.names = FALSE, row.names = FALSE)
-
return()
- }
The remaining scripts do not run - this might be something simple like I’m putting the filenames in the wrong place? Let me know if you can help. Thanks!