corr.axes, length meaning and R script for biplot

Hello -
I am trying to make a biplot with my data as suggested in the Costello analysis example. However, the output of the corr.axes is a bit confusing to me. Here is an example with the taxon definitions included from the sample.final.tx.1.consensus.taxonomy file.

OTU axis1 axis2 length OTU Size Taxonomy
2 -1 -0.028571 1.000408 2 10446 Sulfurovum
13 -0.941124 -0.27323 0.979984 13 93 Thiomicrospira

So, the length value means what exactly? I understand that the values under axis 1 and 2 represent the correlation coefficients. So when I made the biplot I made arrows that start at the origin(0,0) and go to these points (e.g. -1,-0.028). So, that way the viewer can see how correlated the taxa are with each axis in the NMDS plot. However, what does the length have to do with it? Should I make that the length of the arrow?

Also, how to choose which taxa to plot? Is the length an important indicator or should I just be considering the correlation coefficient?

In this case, I chose to only plot taxa with an overall length over 0.85 and a phylotype size over 90. But this may not be a good criterion since I don’t know what “length” means.

Also, why is does that label say OTU when this is a phylotype analysis? Is this just a legacy thing or is it true that when you create phylotypes they can still be referred to as OTUs (just based on a different definition - taxonomic rather than genetic distance)?

Thanks for the clarification.

Kristina

PS. Here is the handy R script I wrote to make the NMDS plot, add the arrows and the labels. It’s not sophisticated but it works.

#R script to create an nmds or pcoa plot in R using mothur output
#Also will add arrows and labels, as in a biplot, to show which OTUs are causing shifts in microbial community
#Make sure to add a column header “group” above your point labels in .thetayc.1.lt.nmds.axes file

Kristina Fontanez, July 2011


library(calibrate) library(shape) #-------------------------------------------------------------------- #Sends the output to a pdf file #-------------------------------------------------------------------- pdf("Thetayc genus NMDS.pdf")

#--------------------------------------------------------------------
#Read in the nmds or pcoa axes file
#--------------------------------------------------------------------
A <-read.table(“fries.final.tx.thetayc.1.lt.nmds.axes”, header=T)

#--------------------------------------------------------------------

Make the plot, add axis labels and set the x,y limits (change as needed)

#--------------------------------------------------------------------
plot(A$axis1, A$axis2, col=1, pch=20, type=“p”, xlab="axis 1 ", ylab="axis 2 ", main=“thetayc genus nmds”, xlim=c(-1.5,1.5), ylim=c(-1.5,1.5))

#--------------------------------------------------------------------
#Add labels to the points
#--------------------------------------------------------------------
textxy(A$axis1, A$axis2, A$group,cx=1)

#--------------------------------------------------------------------
#Read in the correlation datafile (Result of corr.axes function) and set up variables
#--------------------------------------------------------------------
points<-read.table(“thetayc.1.nmds.corr.txt”, header=T)
x1<-points$axis1
y1<-points$axis2
labels<-points$Taxonomy

#--------------------------------------------------------------------
#Add arrows and labels showing which OTUs are causing shifts in community
#--------------------------------------------------------------------
Arrows(0,0,x1,y1,code=2,arr.length=0.3,arr.width=0.3,arr.type=“triangle”,arr.adj=1)
textxy(x1,y1,labels,cx=0.8)
dev.off()

So, the length value means what exactly? I understand that the values under axis 1 and 2 represent the correlation coefficients. So when I made the biplot I made arrows that start at the origin(0,0) and go to these points (e.g. -1,-0.028). So, that way the viewer can see how correlated the taxa are with each axis in the NMDS plot. However, what does the length have to do with it? Should I make that the length of the arrow?

Also, how to choose which taxa to plot? Is the length an important indicator or should I just be considering the correlation coefficient?

The “length” is the length of the vector - sqrt(x^2 + y^2). I put it in because it might be helpful in sorting the rows to figure out what to plot. I would tend to sort the rows by length and take top N vectors and go from there.

Hope this helps,
Pat

Pat-

Thanks for the insight. The top N vectors (lengths) however might be associated with OTUs that are not very abundant. Would you agree that a length of 0.98 that is associated with an OTU than only had an abundance of 3 sequences is less important than a length of 0.88 with an abundance of 3000?

Second question - the OTU issue. If I am using classify.otu(taxonomy=CSarch.final.taxonomy, name=CSarch.final.names, list=CSarch.final.tx.list, basis=sequence, group=CSarch.final.groups, label=1-2-3-5, reftaxonomy=trainset6_032010.rdp.tax, cutoff=80)
am I doing this incorrectly? Should I be using classify.sequences? I am just confused by the OTU label instead of there being a phylotype label when I am using the .tx.list instead of the an.list.

Thanks,
Kristina

The top N vectors (lengths) however might be associated with OTUs that are not very abundant. Would you agree that a length of 0.98 that is associated with an OTU than only had an abundance of 3 sequences is less important than a length of 0.88 with an abundance of 3000?

Good point - at some point we should probably throw in p-values to assess significance.

Second question - the OTU issue. If I am using classify.otu(taxonomy=CSarch.final.taxonomy, name=CSarch.final.names, list=CSarch.final.tx.list, basis=sequence, group=CSarch.final.groups, label=1-2-3-5, reftaxonomy=trainset6_032010.rdp.tax, cutoff=80)
am I doing this incorrectly? Should I be using classify.sequences? I am just confused by the OTU label instead of there being a phylotype label when I am using the .tx.list instead of the an.list.

You need to first run classify.seqs to get CSarch.final.taxonomy, but other than that, the OTU heading is the same as phylotype.

Pat

Great, thank you so much for clarifying these issues.

Kristina

Kristina -

I like your script for plotting the nMDS and PCOA axes. Though I would love to label the vectors with the appropriate OTU or bacterial taxa? It seems that’s what you were trying to do but when I run the script I get labels for my samples but no labels on the vectors. Any idea how to write that in, or modify my files so that the vector labels are printed?

Thanks very much!
Kathy