Rolling Functions • cpp4r

These examples were adapted from Vaughan, Hester, and Francois (2024).

Note

These functions ignore NA values for now. Adjustments for handling NA values are covered in a separate vignette.

R already provides efficient versions of the functions covered here. This is just to illustrate how to use C++ code.

Cumulative sum

Base R provides the cumsum() function to compute the cumulative sum of a vector:

cumsum(1:5)

One possible C++ function to implement this is:

[[cpp4r::register]]
doubles cumsum_cpp(doubles x) {
  int n = x.size();
  writable::doubles out(n);

  out[0] = x[0];
  for (int i = 1; i < n; ++i) {
    out[i] = out[i - 1] + x[i];
  }
  return out;
}

Its R equivalent would be:

cumsum_r <- function(x) {
  n <- length(x)
  out <- numeric(n)
  out[1] <- x[1]
  for (i in 2:n) {
    out[i] <- out[i - 1] + x[i]
  }
  out
}

Add and document the functions, update the package as in the previous vignettes, and then compare the functions speed with:

# install.packages("bench")
library(bench)

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements

cumsum(1:3)
cumsum_cpp(1:3)
cumsum_r(1:3)

mark(
  cumsum(x),
  cumsum_cpp(x),
  cumsum_r(x)
)

Cumulative product

Base R provides the cumprod() function to compute the cumulative product of a vector:

cumprod(1:5)

One possible C++ function to implement this is:

[[cpp4r::register]] doubles cumprod_cpp(doubles x) {
  int n = x.size();
  writable::doubles out(n);

  out[0] = x[0];
  for (int i = 1; i < n; ++i) {
    out[i] = out[i - 1] * x[i];
  }
  return out;
}

Its R equivalent would be:

cumprod_r <- function(x) {
  n <- length(x)
  out <- numeric(n)
  out[1] <- x[1]
  for (i in 2:n) {
    out[i] <- out[i - 1] * x[i]
  }
  out
}

To test the functions, you can run the following benchmark code in the R console:

mark(
  cumprod(x),
  cumprod_cpp(x),
  cumprod_r(x)
)

Range of values

A simple example of the range() function in R is:

range(x)

One possible C++ function to implement this is:

[[cpp4r::register]]
doubles range_cpp(doubles x) {
  int n = x.size();
  double x1 = x[0], x2 = x[0];

  for (int i = 1; i < n; ++i) {
    x1 = std::min(x1, x[i]);
    x2 = std::max(x2, x[i]);
  }

  writable::doubles out(2);
  out[0] = x1;
  out[1] = x2;

  return out;
}

Write its R equivalent.

To verify the functions, you can run the following tests and benchmark code in the R console:

# install.packages("purrr")
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("patchwork")
library(purrr)
library(dplyr)
library(ggplot2)
library(patchwork)

range(x)
range_cpp(x)
# range_r(x)

# create random vectors
set.seed(123) # for reproducibility
bigx <- list(
  as.double(rpois(2e6, lambda = 2)),
  as.double(rpois(4e6, lambda = 2)),
  as.double(rpois(8e6, lambda = 2)),
  as.double(rpois(16e6, lambda = 2)),
  as.double(rpois(32e6, lambda = 2)),
  as.double(rpois(64e6, lambda = 2))
)

results <- map(
  bigx,
  ~ mark(
    range(.x),
    range_cpp(.x)
    # range_r(.x)
  ) %>%
    mutate(n = length(.x))
)

d <- results %>%
  bind_rows() %>%
  unnest(c(time, mem_alloc, gc, n)) %>%
  select(expression, time, mem_alloc, gc, n)

g1 <- ggplot(d, aes(x = n, y = time, color = expression)) +
  geom_jitter(width = 0.01, height = 0.01) +
  scale_color_viridis_d() +
  theme_minimal()

g2 <- ggplot(d, aes(x = n, y = mem_alloc, color = expression)) +
  geom_jitter(width = 0.01, height = 0.01) +
  scale_color_viridis_d() +
  theme_minimal()

g1 / g2

References

Vaughan, Davis, Jim Hester, and Roman Francois. 2024. “Get Started with Cpp11.” https://cpp11.r-lib.org/articles/cpp11.html#intro.