Debugging R Packages

Motivation

The previous package skeleton left out some essential details, such as testing for memory leaks and debugging. This chapter will cover these aspects in more detail. The references for this chapter are Vaughan, Hester, and Francois (2024), Padgham (2022), Vaughan (2019), Wickham and Bryan (2023), and R-pkg-devel (2024).

Load the Package

I loaded the ece244 package as I added the functions from the next sections to it with the following code:

load_all()

Compiler Setup

To test that my functions do not lead to memory errors, I created src/Makevars within the package folder and added:

PKG_CXXFLAGS = -Wall -O0 -pedantic
PKG_CPPFLAGS = -UDEBUG -g

On Windows, the file is src/Makevars.win and the content is the same.

This requires some explanations:

  • -Wall: Enables all warnings.
  • -O0: Disables optimizations for debugging purposes. Otherwise, compiler will adjust the package compiled binaries for speedups, making it harder to debug. Once your code is working, you can switch to -O2 or -O3 to enable optimizations for production. If you plan to submit the package to CRAN, remove the flag, they do not accept it.
  • -pedantic: Enforces strict ISO C++ compliance. This will warn about improper writing, similar to bad grammar or spelling errors in English.
  • -UDEBUG -g: Disables the DEBUG macro and enables debugging information. When a package is ready, -g can be removed to reduce the size of the compiled binaries.

In case that the “pedantic” part is not clear, here is an example:

#include "cpp11.hpp"
#include <numeric>

using namespace cpp11;

// Non-ISO: Use a variable length array
[[cpp11::register]] double squared_sum_non_iso_(integers inp) {
  int size = inp.size();
  double array[size];  // will give a warning, but still compile

  for (int i = 0; i < size; ++i) {
    array[i] = inp[i] * inp[i];
  }

  return std::accumulate(array, array + size, 0.0);
}

// ISO: Use a vector
[[cpp11::register]] double squared_sum_iso_(integers inp) {
  int size = inp.size();
  std::vector<double> vec(size);

  for (int i = 0; i < size; ++i) {
    vec[i] = inp[i] * inp[i];
  }

  return std::accumulate(vec.begin(), vec.end(), 0.0);
}

Even when the code compiles, it gives a warning:

code.cpp:489:10: warning: ISO C++ forbids variable length array ‘array’ [-Wvla]
  489 |   double array[size];  // will give a warning, but still compile

It is possible to verify that the functions are correct:

all.equal(sum((1:5)^2), squared_sum_non_iso_(1:5), squared_sum_iso_(1:5))

Testing for Memory Leaks

To test for memory leaks, I used the valgrind tool. This tool is available on Linux and macOS.

One way to test for memory leaks is to run the following command in the terminal:

valgrind --leak-check=full Rscript -e "library(ece244); squared_sum_iso_(1:5)"

Or, alternatively, to call R in vanilla mode:

R --vanilla -d 'valgrind -s --track-origins=yes' -f test.R

with test.R containing:

library(ece244)
squared_sum_iso_(1:5)

Adding a configure script

For a portable package, it is recommended to add a configure script. This script will check for the necessary tools to build the package. The script is written in bash and is placed in the configure file in the package root directory.

Here is an example of a configure script for the ece244 package:

#!/bin/sh

PKG_CONFIG_NAME="gccsanissue"

pkg-config --version >/dev/null 2>&1
if [ $? -eq 0 ]; then
  PKGCONFIG_CFLAGS=`pkg-config --cflags --silence-errors`
  PKGCONFIG_LIBS=`pkg-config --libs`
fi

if [ "$PKGCONFIG_CFLAGS" ] || [ "$PKGCONFIG_LIBS" ]; then
  echo "Found pkg-config cflags and libs!"
  PKG_CFLAGS=${PKGCONFIG_CFLAGS}
  PKG_LIBS=${PKGCONFIG_LIBS}
fi

CXXFLAGS="-stdlib=libc++"
# CXXFLAGS="-O0 -g -stdlib=libc++"

LDFLAGS="-stdlib=libc++"

sed -e "s|@cxxflags@|$CXXFLAGS|" \
    -e "s|@ldflags@|$LDFLAGS|" \
    src/Makevars.in > src/Makevars

exit 0

This file, meant for Unix systems, must be accompanied by an configure.win file for Windows systems.

Besides the configure script, an src/Makevars.in file must be created. This file is a template for the src/Makevars file. Here is an example:

LDFLAGS=@ldflags@

# Convert source files to object files
SOURCES = code.cpp \
                    cpp11.cpp
OBJECTS = $(SOURCES:.cpp=.o)

all: $(SHLIB)

$(SHLIB): $(OBJECTS)

clean: rm -f $(OBJECTS) $(SHLIB)

Makevars.in is also for Unix systems, and it must be accompanied by an empty Makevars.win for Windows systems.

Finally, a cleanup script helps to get a tidy package build. Here is an example:

#!/bin/sh
rm -f src/Makevars configure.log

The advantage of this approach is that it will create a src/Makevars file with the correct flags for the system. This way, the package will be portable and easier to test with GitHub Actions or Docker.

Testing with Docker

CRAN checks packages on different Unix platforms, and additional tests for compiled code include testing for memory leaks with valgrind and address sanitizer.

Derived from the recommendations made by Dr. Krylov and Dr. Eddelbuettel in the R-pkg-devel mailing list, I created the following script to test the package in a Docker container:

#!/bin/sh

PACKAGE_DIR=$(pwd)

# DOCKER_IMAGE="ghcr.io/r-hub/containers/valgrind:latest"
DOCKER_IMAGE="ghcr.io/r-hub/containers/clang-asan:latest"

docker pull $DOCKER_IMAGE

docker run --rm -v "$PACKAGE_DIR":/workspace -w /workspace $DOCKER_IMAGE bash -c "
  Rscript -e 'install.packages(\"cpp11\", repos=\"https://cran.rstudio.com/\")'
  R CMD build .
  R CMD check --as-cran --no-manual gccsanissue_0.1.0.tar.gz
"

I added the following function to ece244:

[[cpp11::register]] int bad_() {
  int x = 42;    // valid integer
  int *ptr = &x; // pointer to `x`

  // undefined behavior (alignment issue)
  auto misaligned_ptr = reinterpret_cast<long*>(ptr);
  return *misaligned_ptr; // Read through misaligned pointer
}

To access the function from the R session, I added the following R code:

#' @title Bad function
#' @description This function has a GCC SAN issue
#' @export 
#' @examples 
#' bad()
bad <- function() {
  bad_()
}

The function bad_ introduces undefined behavior by reading through a misaligned pointer. This is a common issue in C++ code, and it is a good example to test the address sanitizer.

reinterpret_cast<long*> creates a misaligned pointer since ptr is aligned for int, not long. The pointer dereference *misaligned_ptr introduces an undefined behavior. Without sanitizers, this will silently execute, and it returns the value of x. With sanitizers, it will throw an error.

When testing with the valgrind container and the command bash dev/test-docker.sh, the R checks will pass, but the valgrind check will fail with the following error:

Shadow bytes around the buggy address:
  0x6ffdc4008d80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==1244==ABORTING

References

Padgham, Mark. 2022. “Debugging in R with a Single Command.” https://mpadge.github.io/blog/blog012.html.
R-pkg-devel. 2024. “[R-Pkg-Devel] Possible False Negative for Compiled C++ Code in CRAN Checks.” https://stat.ethz.ch/pipermail/r-package-devel/2024q4/011193.html.
Vaughan, Davis. 2019. “Debugging an R Package with C++ – Davis Vaughan.” https://blog.davisvaughan.com/posts/2019-04-05-debug-r-package-with-cpp/.
Vaughan, Davis, Jim Hester, and Roman Francois. 2024. “Get Started with Cpp11.” https://cpp11.r-lib.org/articles/cpp11.html#intro.
Wickham, Hadley, and Jenny Bryan. 2023. “R Packages (2e).” https://r-pkgs.org/.