Debugging R programs
Table of Contents
Writing perfect code is impossible, and typically your code will contain some errors or flaws (commonly referred as bugs). This document will provide you with some advice on how to detect, address, and hopefully avoid, bugs in your code. Some of the advice provided here is my own opinion, based on my demands and experiences.
1. How to reduce bugs
- Read the documentation
- A well-written R package comes with detailed documentation and examples for its functions. Do not assume that you know what the function does.
- Do not blindly copy/paste an example you found online.
- Write defensively
- Test every function you write with simple inputs.
- Check that inputs to the function are of the right type, dimension, etc.
- Insert regular error messages.
- Be mindful of numerical overflow/underflow, e.g.
log(1+x)
vslog1p(x)
; the former will return0
ifx
is not larger than1e-16
.
- Use a suitable editor (I use emacs with ESS; there is a ready-to-use
distribution for windows and mac.)
- Clear display.
- Easy-to-distinguish fonts, e.g.,
1
vsl
,0
vsO
. (I use Liberation Mono, 9pt.) - Theme with good contrast between foreground and background, ideally following the Web Accessibility Guidelines.
- Comments should not be "greyed-out", but easy to read.
- Black-on-white is easier to read than white-on-black (unless you are trying to put yourself to sleep, in which case, you shouldn't be programming).
- Easy-to-distinguish fonts, e.g.,
- Code navigation among multiple files.
- Clear display.
- Consistent programming style indentation, naming, spaces, etc.
2. Locate errors using traceback()
Our immediate reaction when coming across an error message is locating the offending line in our code. Let us consider the following code.
fun <- function (x, y) { ## Computes x^T * y t(x) %*% y } gun <- function (A, b) { ## Computes b^T * A * b where A is a matrix and b is a vector A <- (A + t(A))/2 fun(b, fun(A, b)) } A <- matrix(1:9, 3, 3) b <- matrix(1:3, 1, 3) gun(A, b)
Error in t(x) %*% y : non-conformable arguments
We can probably guess where the error occurs in this short code, but it may not be so simple for long codes. In this case we want to print a list of function calls that led to the error, called the call stack.
traceback()
3: fun(A, b) at #3 2: fun(b, fun(A, b)) at #4 1: gun(A, b)
The call to traceback()
provides a list of the call stack up to the point
that the error occurred. The order of calls is from bottom to the top. Next
to each call, the line number within the next function in the call stack is
shown. If the function is sourced from a file, then the file name and the
line within that file are shown.
3. Insert breakpoints using browser()
One of the hardest things in programming is realising you have a bug. If you are lucky, R will show an error if it finds a bug. In the worst-case scenario, you will get some result which you don't know if it is right or wrong. In this case, you need to meticulously sift through your code to find the error. Use a divide-and-conquer approach by verifying portions of your code. For this, you may need to "step in" to your functions and interactively evaluate the code within the function, effectively pausing execution of your program for you to investigate. The feature that allows you to do this is called a breakpoint.
In the following example we compute the maximum likelihood estimator (MLE) of
a sample x
from the Bernoulli distribution. The function lik_bern
computes the likelihood function for probability p
and data x
. The
function mle_bern
maximises the likelihood using golden-section search.
lik_bern <- function (p, x) { prod(dbinom(x, 1, p)) } mle_bern <- function (x, maxit = 20) { tol <- 1e-6 outer <- c(0, 1) inner <- c(1/3, 2/3) Louter <- c(lik_bern(outer[1], x), lik_bern(outer[2], x)) Linner <- c(lik_bern(inner[1], x), NA) j <- 2 for (i in 1:maxit) { Linner[j] <- lik_bern(inner[j], x) j <- 1 + (Linner[1] > Linner[2]) outer[j] <- inner[j] Louter[j] <- Linner[j] mle <- inner[j] <- mean(inner) if (inner[2] - inner[1] < tol) { break } } c(mle = mle, iterations = i) }
x <- rep(c(0, 1), 100) # 0, 1, 0, 1, ... mle_bern(x) # Correct result
mle iterations 0.5 19.0
x <- rep(c(0, 1), 1000) mle_bern(x) # Wrong result
mle iterations 0.66667 19.00000
We can recognise that the second calculation above is wrong. To investigate
this further, we want to insert a breakpoint at appropriate place in our
function, perhaps before we start iterating inside mle_bern
. The new
function will look like as follows.
mle_bern <- function (x, maxit = 20) { tol <- 1e-6 outer <- c(0, 1) inner <- c(1/3, 2/3) Louter <- c(lik_bern(outer[1], x), lik_bern(outer[2], x)) Linner <- c(lik_bern(inner[1], x), NA) j <- 2 browser() # <------- Breakpoint here for (i in 1:maxit) { Linner[j] <- lik_bern(inner[j], x) j <- 1 + (Linner[1] > Linner[2]) outer[j] <- inner[j] Louter[j] <- Linner[j] mle <- inner[j] <- mean(inner) if (inner[2] - inner[1] < tol) { break } } c(mle = mle, iterations = i) }
Then, we run our code again.
x <- rep(c(0, 1), 1000) mle_bern(x) # Wrong result
Called from: mle_bern(x) Browse[1]> ls() [1] "inner" "j" "Linner" "Louter" "maxit" "outer" "tol" "x" Browse[1]>
We notice that the execution pauses at the point where the breakpoint is
inserted and the prompt changes to Browse[1]>
to indicate that we are in
debugging mode. We are also placed inside the function's environment where
local variables are available up to the breakpoint. Breakpoints can also be
reached conditional on some expression. For example, if we want to stop at
the tenth iteration inside the loop, we can use if (i == 10) browser()
.
As demonstrated above, we can type any R expression, which will be evaluated inside the function. In addition, we have the following list of special commands available.
Command | What it does |
---|---|
n | execute the Next statement |
s | Step into the function on the current line |
f | Finish the current function/loop |
c or cont | Continue running until the next breakpoint or end of program |
Q | Quit the debugger and exit the program |
help | display this list of HELP commands |
where | WHERE are we in the code (show the call stack) |
If you don't want to manually edit your function to insert a breakpoint, use
debug
or debugonce
to automatically insert a breakpoint at the beginning
of a given function, or use setBreakpoint
to automatically insert a
breakpoint in a file which contains your function.
4. Enter debug mode on an error using options(error = recover)
In the earlier section, we had to guess the possible location of the bug and
manually insert breakpoints. In cases where the code exits with an error, we
can enter debug mode as soon as an error occurs using the command
options(error = recover)
, before re-running the code that gave us the error.
options(error = recover) # Enter debug mode on an error fun <- function (x, y) { ## Computes x^T * y t(x) %*% y } gun <- function (A, b) { ## Computes b^T * A * b where A is a matrix and b is a vector A <- (A + t(A))/2 fun(b, fun(A, b)) } A <- matrix(1:9, 3, 3) b <- matrix(1:3, 1, 3) options(error = recover) gun(A, b)
Error in t(x) %*% y : non-conformable arguments Enter a frame number, or 0 to exit 1: gun(A, b) 2: #4: fun(b, fun(A, b)) 3: #3: fun(A, b) Selection:
R then expects us to type a number to enter the environment of the indicated
function call. So if we type 1
, we will have variables A
and b
available, and if we type 2
we will have variables x
, which has the value
of b
, and y
, which has the value of fun(A,b)
.
After we choose a frame, we can get back to the selection menu using the
command c
. We can turn off the debug-on-error mode using the command
options(error = NULL)
.
5. Debug non-interactive runs using dump.frames
In all cases above, we are working in an interactive session. Time-consuming
programs are usually ran in a non-interactive (or batch) session. This is the
case when using a high-performance computer. The function dump.frames
allows us to save the current call stack, which we can then examine in an
interactive session. This function is best utilised when combined with
options(error = )
so that the call stack is saved at the point where the
error occurs. One way to achieve that is by having the following at the
beginning of our code.
dump_and_quit <- function() { dump.frames(to.file = TRUE) # Save debugging info to file last.dump.rda q(save = "no", status = 1) # Quit R with error status } options(error = dump_and_quit)
This will create a file called last.dump.rda
which contains all debugging
information. Later, we can enter debug mode in an interactive session with
load("last.dump.rda") debugger()
6. Summary
This document provides various techniques for debugging R programs that I use regularly. There are many more for you to explore, some built-in and some provided by third-party packages. I have also not discussed techniques for debugging compiled code that is called within R. Mastering these techniques will help you debug R programs quickly and effectively.