Rolling Functions

Motivation

This chapter covers the implementation of simple rolling functions in C++ and R. The goal is to show the syntax differences between the two languages and compare their performance. These examples were adapted from Vaughan, Hester, and Francois (2024).

Fair Warning

These functions ignore NA values. Adjustments for handling NA values will be introduced in the sixth chapter.

R already provides efficient versions of the functions covered here. Code optimizations and improvements will be made in later chapters.

Load the Package

I loaded the ece244 package as I added the functions from the next sections to it with the following code:

load_all()

Additional Packages

The following packages are needed for visualizing and simplifying the presentation of the results:

library(bench)
library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)
library(patchwork)

Cumulative Sum (cumsum())

The next function calculates the cumulative sum of a vector’s elements:

[[cpp11::register]] doubles cumsum_cpp_(doubles x) {
  int n = x.size();
  writable::doubles out(n);

  out[0] = x[0];
  for (int i = 1; i < n; ++i) {
    out[i] = out[i - 1] + x[i];
  }
  return out;
}

Its R equivalent is:

#' Return the cumulative sum of a vector (R)
#' @param x numeric vector
#' @export
cumsum_r <- function(x) {
  n <- length(x)
  out <- numeric(n)
  out[1] <- x[1]
  for (i in 2:n) {
    out[i] <- out[i - 1] + x[i]
  }
  out
}

To document the C++ function, I added the following wrapper to the R code:

#' Return the cumulative sum of a vector (C++)
#' @inheritParams cumsum_r
#' @export
cumsum_cpp <- function(x) {
  cumsum_cpp_(as.double(x))
}

To test the functions, I ran the following benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements

cumsum(1:3)
[1] 1 3 6
cumsum_cpp(1:3)
[1] 1 3 6
cumsum_r(1:3)
[1] 1 3 6
mark(
  cumsum(x),
  cumsum_cpp(x),
  cumsum_r(x)
)
# A tibble: 3 × 6
  expression         min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 cumsum(x)       2.33ms   2.66ms     373.     3.81MB    43.0 
2 cumsum_cpp(x)  29.15ms  30.02ms      32.2   15.26MB    32.2 
3 cumsum_r(x)    52.02ms  52.16ms      19.1    7.66MB     8.20

Cumulative Product (cumprod())

R provides the cumprod() function to compute the cumulative product of a vector:

cumprod(1:5)
[1]   1   2   6  24 120

One possible C++ function to implement this is:

[[cpp11::register]] doubles cumprod_cpp_(doubles x) {
  int n = x.size();
  writable::doubles out(n);

  out[0] = x[0];
  for (int i = 1; i < n; ++i) {
    out[i] = out[i - 1] * x[i];
  }
  return out;
}

Its R equivalent is:

#' Return the cumulative product of a vector (R)
#' @param x numeric vector
#' @export
cumprod_r <- function(x) {
  n <- length(x)
  out <- numeric(n)
  out[1] <- x[1]
  for (i in 2:n) {
    out[i] <- out[i - 1] * x[i]
  }
  out
}

To document the C++ function, I added the following wrapper to the R code:

#' Return the cumulative product of a vector (C++)
#' @inheritParams cumprod_r
#' @export
cumprod_cpp <- function(x) {
  cumprod_cpp_(as.double(x))
}

To test the functions, I ran the following benchmark code in the R console:

mark(
  cumprod(x),
  cumprod_cpp(x),
  cumprod_r(x)
)
# A tibble: 3 × 6
  expression          min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 cumprod(x)       1.84ms   5.47ms     201.    15.26MB    90.2 
2 cumprod_cpp(x)  29.27ms  33.57ms      30.8   15.26MB    14.0 
3 cumprod_r(x)    54.81ms  55.06ms      18.1    7.63MB     2.59

Range of Values (range())

A simple example of the range() function in R is:

range(x)
[1]  0 13

One possible C++ function to implement this is:

[[cpp11::register]] doubles range_cpp_(doubles x) {
  int n = x.size();
  double x1 = x[0], x2 = x[0];

  for (int i = 1; i < n; ++i) {
    x1 = std::min(x1, x[i]);
    x2 = std::max(x2, x[i]);
  }

  writable::doubles out(2);
  out[0] = x1;
  out[1] = x2;

  return out;
}

To document the C++ function, I added the following wrapper to the R code:

#' Return the range of values in a vector (C++)
#' @param x numeric vector
#' @export
range_cpp <- function(x) {
  range_cpp_(as.double(x))
}

To verify the functions, I ran the following tests and benchmark code in the R console:

range(x)
[1]  0 13
range_cpp(x)
[1]  0 13
# create random vectors
set.seed(123) # for reproducibility
bigx <- list(
  as.double(rpois(2e6, lambda = 2)),
  as.double(rpois(4e6, lambda = 2)),
  as.double(rpois(8e6, lambda = 2)),
  as.double(rpois(16e6, lambda = 2)),
  as.double(rpois(32e6, lambda = 2)),
  as.double(rpois(64e6, lambda = 2))
)

results <- map(
  bigx,
  ~ mark(
    range(.x),
    range_cpp(.x)
  ) %>%
    mutate(n = length(.x))
)

d <- results %>%
  bind_rows() %>%
  unnest(c(time, mem_alloc, gc, n)) %>%
  select(expression, time, mem_alloc, gc, n)

g1 <- ggplot(d, aes(x = n, y = time, color = expression)) +
  geom_jitter(width = 0.01, height = 0.01) +
  scale_color_viridis_d() +
  theme_minimal()

g2 <- ggplot(d, aes(x = n, y = mem_alloc, color = expression)) +
  geom_jitter(width = 0.01, height = 0.01) +
  scale_color_viridis_d() +
  theme_minimal()

g1 / g2

References

Vaughan, Davis, Jim Hester, and Roman Francois. 2024. “Get Started with Cpp11.” https://cpp11.r-lib.org/articles/cpp11.html#intro.