R statistics package

short description of the R statistics package on the LRZ Linux cluster



Preliminaries

R is a statistics package and was developed as a free successor to the S and Splus languages. It is probably a bit harder to learn than other statistics tools but once you are used to the functional programming approach of R it gives you great flexibility. You can accomplish complex tasks with just very few commands and produce publication quality hardcopy output. It also allows you to add functionality and automate processes. R is available on all the most common platforms. Rlogo


Availability and starting R

R is available on the LRZ-Linux cluster and can be used interactively and in batch-mode. For using R interactively log into one of the interactive nodes and type

 module load R

for loading R

 R

for starting R.

Short example (reading data and visualisation)

In most cases you will have some data that you would like to read and analyse lateron. The most straightforward way for loading data into R is reading from a text file. The file 'measurements.txt' contains tab separated data columns (of performance measurements for the Itanium2 processor). (It is also possible to read data in other formats or reading from a database; please refer to the R documentation for further information).

Having started R in the directory where your datafile resides you can read the file into a so called data-frame 'measurements' by using the read function:

> measurements <- read.table("measurements.txt", header=TRUE)

It is possible to inspect the contents of the data-frame and all other data objects by simply entering the respective name at the R prompt:

> measurements

The measurements contain data for different sampling intervals which are given in one column. It is common that the available data needs to be grouped by the contents of one column; this can be achieved very conveniently with using factors; in the following a factor is created from the sampling interval 'stime':

> stimef <- factor(measurements["stime"][,1])

Then a boxplot containing a separate box for each sampling interval giving the variability of the measurements for that sampling interval can be created by:

> plot(measurements["FP_OPS_RETIRED"][,1]/measurements["stime"][,1]/1.0E+9 ~ stimef, 
main="variability of 100 samples (5 min. sampling interval)",
xlab="sampling length [s]", ylab = "[GFlop/s]")

Data can be prepared for hardcopy in a variety of formats (like e.g. postcript, pdf, png,...). For creating a png-file, first set up the graphics device:

> png("boxplot_GFlop.png", width=600, height=400)

Then you have to perform the plot command(s) that you would like to have output to the file you entered. Finally switch off the png device for writing the data and closing the output file:

> dev.list()
X11 PNG
  2   3
> dev.off(3)

Now you should find a file 'boxplot_GFlop.png' in the current directory.
boxplot_GFlop.png image (example output)
The above is only a short example for giving you a feeling what R is like. You can find further information in the references given below.

References:

The first address for further information is the homepage of the R-project. Another useful source of information might be the Wiki of the R-project.

If you have any questions, suggestions or would be interested in additional packages to be installed on the machines, please feel free to direct an email to lxadmin_AT_lrz.de.