Home > Doc > An Introduction to R > Graphical procedures

An Introduction to R

Graphical procedures

It is possible to use the facilities to display a wide variety of statistical graphs and also to build entirely new types of graph. The graphics facilities can be used in both interactive and batch modes, but in most cases, interactive use is more productive. Interactive use is also easy because at startup time R initiates a graphics device driver which opens a special graphics window for the display of interactive graphics. Although this is done automatically, it is useful to know that the command used is X11() under UNIX and windows() under Windows. Once the device driver is running, R plotting commands can be used to produce a variety of graphical displays and to create entirely new kinds of display. Plotting commands are divided into three basic groups:

• High-level plotting functions create a new plot on the graphics device, possibly with axes, labels, titles and so on.

• Low-level plotting functions add more information to an existing plot, such as extra points, lines and labels.

• Interactive graphics functions allow you interactively add information to, or extract information from, an existing plot, using a pointing device such as a mouse.

In addition, R maintains a list of graphical parameters which can be manipulated to customize your plots. This manual only describes what are known as ‘base’ graphics. A separate graphics subsystem in package grid coexists with base – it is more powerful but harder to use. There is a recommended package lattice which builds on grid and provides ways to produce multi-panel plots akin to those in the Trellis system in S.

High-level plotting commands

High-level plotting functions are designed to generate a complete plot of the data passed as arguments to the function. Where appropriate, axes, labels and titles are automatically generated (unless you request otherwise.) High-level plotting commands always start a new plot, erasing the current plot if necessary.

The plot() function

One of the most frequently used plotting functions in R is the plot() function. This is a generic function: the type of plot produced is dependent on the type or class of the first argument.

plot(x, y)

plot(xy) If x and y are vectors, plot(x, y) produces a scatterplot of y against x. The same effect can be produced by supplying one argument (second form) as either a list containing two elements x and y or a two-column matrix.

plot(x) If x is a time series, this produces a time-series plot. If x is a numeric vector, it produces a plot of the values in the vector against their index in the vector. If x is a complex vector, it produces a plot of imaginary versus real parts of the vector elements.

plot(f)

plot(f, y) f is a factor object, y is a numeric vector. The first form generates a bar plot of f ; the second form produces boxplots of y for each level of f.

plot(df)

plot(~ expr)

plot(y ~ expr) df is a data frame, y is any object, expr is a list of object names separated by ‘+’ (e.g., a + b + c). The first two forms produce distributional plots of the variables in a data frame (first form) or of a number of named objects (second form). The third form plots y against every object named in expr.

Displaying multivariate data

R provides two very useful functions for representing multivariate data. If X is a numeric matrix or data frame, the command

> pairs(X)

produces a pairwise scatterplot matrix of the variables defined by the columns of X, that is, every column of X is plotted against every other column of X and the resulting n(n − 1) plots are arranged in a matrix with plot scales constant over the rows and columns of the matrix. When three or four variables are involved a coplot may be more enlightening. If a and b are numeric vectors and c is a numeric vector or factor object (all of the same length), then the command

> coplot(a ~ b | c)

produces a number of scatterplots of a against b for given values of c. If c is a factor, this simply means that a is plotted against b for every level of c. When c is numeric, it is divided into a number of conditioning intervals and for each interval a is plotted against b for values of c within the interval. The number and position of intervals can be controlled with given.values= argument to coplot()—the function co.intervals() is useful for selecting intervals. You can also use two given variables with a command like

> coplot(a ~ b | c + d)

which produces scatterplots of a against b for every joint conditioning interval of c and d. The coplot() and pairs() function both take an argument panel= which can be used to customize the type of plot which appears in each panel. The default is points() to produce a scatterplot but by supplying some other low-level graphics function of two vectors x and y as the value of panel= you can produce any type of plot you wish. An example panel function useful for coplots is panel.smooth().

Display graphics

Other high-level graphics functions produce different types of plots. Some examples are:

qqnorm(x)

qqline(x)

qqplot(x, y) Distribution-comparison plots. The first form plots the numeric vector x against the expected Normal order scores (a normal scores plot) and the second adds a straight line to such a plot by drawing a line through the distribution and data quartiles. The third form plots the quantiles of x against those of y to compare their respective distributions.

hist(x)

hist(x, nclass=n)

hist(x, breaks=b, ...) Produces a histogram of the numeric vector x. A sensible number of classes is usually chosen, but a recommendation can be given with the nclass= argument. Alternatively, the breakpoints can be specified exactly with the breaks= argument.

If the probability=TRUE argument is given, the bars represent relative frequencies instead of counts.

dotchart(x, ...) Constructs a dotchart of the data in x. In a dotchart the y-axis gives a labelling of the data in x and the x-axis gives its value. For example it allows easy visual selection of all data entries with values lying in specified ranges.

image(x, y, z, ...)

contour(x, y, z, ...)

persp(x, y, z, ...) Plots of three variables. The image plot draws a grid of rectangles using different colours to represent the value of z, the contour plot draws contour lines to represent the value of z, and the persp plot draws a 3D surface.

Arguments to high-level plotting functions

There are a number of arguments which may be passed to high-level graphics functions, as follows:

add=TRUE Forces the function to act as a low-level graphics function, superimposing the plot on the current plot (some functions only).

axes=FALSE Suppresses generation of axes—useful for adding your own custom axes with the axis() function. The default, axes=TRUE, means include axes.

log="x"

log="y"

log="xy" Causes the x, y or both axes to be logarithmic. This will work for many, but not all, types of plot.

type= The type= argument controls the type of plot produced, as follows:

type="p" Plot individual points (the default)

type="l" Plot lines

type="b" Plot points connected by lines (both)

type="o" Plot points overlaid by lines

type="h" Plot vertical lines from points to the zero axis (high-density)

type="s"

type="S" Step-function plots. In the first form, the top of the vertical defines the point; in the second, the bottom.

type="n" No plotting at all. However axes are still drawn (by default) and the coordinate system is set up according to the data. Ideal for creating plots with subsequent low-level graphics functions.

xlab=string

ylab=string Axis labels for the x and y axes. Use these arguments to change the default labels, usually the names of the objects used in the call to the high-level plotting function.

main=string Figure title, placed at the top of the plot in a large font. sub=string Sub-title, placed just below the x-axis in a smaller font.

Next: Low-level plotting commands

Summary: Index