Debugging R programs

Table of Contents

Writing perfect code is impossible, and typically your code will contain some errors or flaws (commonly referred as bugs). This document will provide you with some advice on how to detect, address, and hopefully avoid, bugs in your code. Some of the advice provided here is my own opinion, based on my demands and experiences.

1. How to reduce bugs

  • Read the documentation
    • A well-written R package comes with detailed documentation and examples for its functions. Do not assume that you know what the function does.
    • Do not blindly copy/paste an example you found online.
  • Write defensively
    • Test every function you write with simple inputs.
    • Check that inputs to the function are of the right type, dimension, etc.
    • Insert regular error messages.
    • Be mindful of numerical overflow/underflow, e.g. log(1+x) vs log1p(x); the former will return 0 if x is not larger than 1e-16.
  • Use a suitable editor (I use emacs with ESS; there is a ready-to-use distribution for windows and mac.)
    • Clear display.
      • Easy-to-distinguish fonts, e.g., 1 vs l, 0 vs O. (I use Liberation Mono, 9pt.)
      • Theme with good contrast between foreground and background, ideally following the Web Accessibility Guidelines.
      • Comments should not be "greyed-out", but easy to read.
      • Black-on-white is easier to read than white-on-black (unless you are trying to put yourself to sleep, in which case, you shouldn't be programming).
    • Code navigation among multiple files.
  • Consistent programming style indentation, naming, spaces, etc.

2. Locate errors using traceback()

Our immediate reaction when coming across an error message is locating the offending line in our code. Let us consider the following code.

fun <- function (x, y) {
  ## Computes x^T * y
  t(x) %*% y
}
gun <- function (A, b) {
  ## Computes b^T * A * b where A is a matrix and b is a vector
  A <- (A + t(A))/2
  fun(b, fun(A, b))
}
A <- matrix(1:9, 3, 3)
b <- matrix(1:3, 1, 3)
gun(A, b)
Error in t(x) %*% y : non-conformable arguments

We can probably guess where the error occurs in this short code, but it may not be so simple for long codes. In this case we want to print a list of function calls that led to the error, called the call stack.

traceback()
3: fun(A, b) at #3
2: fun(b, fun(A, b)) at #4
1: gun(A, b)

The call to traceback() provides a list of the call stack up to the point that the error occurred. The order of calls is from bottom to the top. Next to each call, the line number within the next function in the call stack is shown. If the function is sourced from a file, then the file name and the line within that file are shown.

3. Insert breakpoints using browser()

One of the hardest things in programming is realising you have a bug. If you are lucky, R will show an error if it finds a bug. In the worst-case scenario, you will get some result which you don't know if it is right or wrong. In this case, you need to meticulously sift through your code to find the error. Use a divide-and-conquer approach by verifying portions of your code. For this, you may need to "step in" to your functions and interactively evaluate the code within the function, effectively pausing execution of your program for you to investigate. The feature that allows you to do this is called a breakpoint.

In the following example we compute the maximum likelihood estimator (MLE) of a sample x from the Bernoulli distribution. The function lik_bern computes the likelihood function for probability p and data x. The function mle_bern maximises the likelihood using golden-section search.

lik_bern <- function (p, x) {
  prod(dbinom(x, 1, p))
}

mle_bern <- function (x, maxit = 20) {
  tol <- 1e-6
  outer <- c(0, 1)
  inner <- c(1/3, 2/3)
  Louter <- c(lik_bern(outer[1], x), lik_bern(outer[2], x))
  Linner <- c(lik_bern(inner[1], x), NA)
  j <- 2
  for (i in 1:maxit) {
    Linner[j] <- lik_bern(inner[j], x)
    j <- 1 + (Linner[1] > Linner[2])
    outer[j] <- inner[j]
    Louter[j] <- Linner[j]
    mle <- inner[j] <- mean(inner)
    if (inner[2] - inner[1] < tol) {
      break
    }
  }
  c(mle = mle, iterations = i)
}
x <- rep(c(0, 1), 100) # 0, 1, 0, 1, ...
mle_bern(x) # Correct result
mle iterations 
0.5       19.0
x <- rep(c(0, 1), 1000)
mle_bern(x) # Wrong result
    mle iterations 
0.66667   19.00000

We can recognise that the second calculation above is wrong. To investigate this further, we want to insert a breakpoint at appropriate place in our function, perhaps before we start iterating inside mle_bern. The new function will look like as follows.

mle_bern <- function (x, maxit = 20) {
  tol <- 1e-6
  outer <- c(0, 1)
  inner <- c(1/3, 2/3)
  Louter <- c(lik_bern(outer[1], x), lik_bern(outer[2], x))
  Linner <- c(lik_bern(inner[1], x), NA)
  j <- 2
  browser() # <------- Breakpoint here
  for (i in 1:maxit) {
    Linner[j] <- lik_bern(inner[j], x)
    j <- 1 + (Linner[1] > Linner[2])
    outer[j] <- inner[j]
    Louter[j] <- Linner[j]
    mle <- inner[j] <- mean(inner)
    if (inner[2] - inner[1] < tol) {
      break
    }
  }
  c(mle = mle, iterations = i)
}

Then, we run our code again.

x <- rep(c(0, 1), 1000)
mle_bern(x) # Wrong result
Called from: mle_bern(x)
Browse[1]> ls()
[1] "inner"  "j"      "Linner" "Louter" "maxit"  "outer"  "tol"    "x"     
Browse[1]> 

We notice that the execution pauses at the point where the breakpoint is inserted and the prompt changes to Browse[1]> to indicate that we are in debugging mode. We are also placed inside the function's environment where local variables are available up to the breakpoint. Breakpoints can also be reached conditional on some expression. For example, if we want to stop at the tenth iteration inside the loop, we can use if (i == 10) browser().

As demonstrated above, we can type any R expression, which will be evaluated inside the function. In addition, we have the following list of special commands available.

Command What it does
n execute the Next statement
s Step into the function on the current line
f Finish the current function/loop
c or cont Continue running until the next breakpoint or end of program
Q Quit the debugger and exit the program
help display this list of HELP commands
where WHERE are we in the code (show the call stack)

If you don't want to manually edit your function to insert a breakpoint, use debug or debugonce to automatically insert a breakpoint at the beginning of a given function, or use setBreakpoint to automatically insert a breakpoint in a file which contains your function.

4. Enter debug mode on an error using options(error = recover)

In the earlier section, we had to guess the possible location of the bug and manually insert breakpoints. In cases where the code exits with an error, we can enter debug mode as soon as an error occurs using the command options(error = recover), before re-running the code that gave us the error.

options(error = recover) # Enter debug mode on an error
fun <- function (x, y) {
  ## Computes x^T * y
  t(x) %*% y
}
gun <- function (A, b) {
  ## Computes b^T * A * b where A is a matrix and b is a vector
  A <- (A + t(A))/2
  fun(b, fun(A, b))
}
A <- matrix(1:9, 3, 3)
b <- matrix(1:3, 1, 3)
options(error = recover)
gun(A, b)
Error in t(x) %*% y : non-conformable arguments

Enter a frame number, or 0 to exit   
 
1: gun(A, b)
2: #4: fun(b, fun(A, b))
3: #3: fun(A, b)
 
Selection: 

R then expects us to type a number to enter the environment of the indicated function call. So if we type 1, we will have variables A and b available, and if we type 2 we will have variables x, which has the value of b, and y, which has the value of fun(A,b).

After we choose a frame, we can get back to the selection menu using the command c. We can turn off the debug-on-error mode using the command options(error = NULL).

5. Debug non-interactive runs using dump.frames

In all cases above, we are working in an interactive session. Time-consuming programs are usually ran in a non-interactive (or batch) session. This is the case when using a high-performance computer. The function dump.frames allows us to save the current call stack, which we can then examine in an interactive session. This function is best utilised when combined with options(error = ) so that the call stack is saved at the point where the error occurs. One way to achieve that is by having the following at the beginning of our code.

dump_and_quit <- function() {
  dump.frames(to.file = TRUE) # Save debugging info to file last.dump.rda
  q(save = "no", status = 1) # Quit R with error status
}
options(error = dump_and_quit)

This will create a file called last.dump.rda which contains all debugging information. Later, we can enter debug mode in an interactive session with

load("last.dump.rda")
debugger()

6. Summary

This document provides various techniques for debugging R programs that I use regularly. There are many more for you to explore, some built-in and some provided by third-party packages. I have also not discussed techniques for debugging compiled code that is called within R. Mastering these techniques will help you debug R programs quickly and effectively.

7. Further reading

Author: Vangelis Evangelou

Created: 2022-09-29 Thu 09:37