These examples were adapted from Vaughan, Hester, and Francois (2024).
Note
These functions ignore NA
values for now. Adjustments
for handling NA
values are covered in a separate
vignette.
R already provides efficient versions of the functions covered here. This is just to illustrate how to use C++ code.
Cumulative sum
Base R provides the cumsum()
function to compute the
cumulative sum of a vector:
cumsum(1:5)
One possible C++ function to implement this is:
[[cpp4r::register]]
doubles cumsum_cpp(doubles x) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
for (int i = 1; i < n; ++i) {
out[i] = out[i - 1] + x[i];
}
return out;
}
Its R equivalent would be:
cumsum_r <- function(x) {
n <- length(x)
out <- numeric(n)
out[1] <- x[1]
for (i in 2:n) {
out[i] <- out[i - 1] + x[i]
}
out
}
Add and document the functions, update the package as in the previous vignettes, and then compare the functions speed with:
Cumulative product
Base R provides the cumprod()
function to compute the
cumulative product of a vector:
cumprod(1:5)
One possible C++ function to implement this is:
[[cpp4r::register]] doubles cumprod_cpp(doubles x) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
for (int i = 1; i < n; ++i) {
out[i] = out[i - 1] * x[i];
}
return out;
}
Its R equivalent would be:
cumprod_r <- function(x) {
n <- length(x)
out <- numeric(n)
out[1] <- x[1]
for (i in 2:n) {
out[i] <- out[i - 1] * x[i]
}
out
}
To test the functions, you can run the following benchmark code in the R console:
Range of values
A simple example of the range()
function in R is:
range(x)
One possible C++ function to implement this is:
[[cpp4r::register]]
doubles range_cpp(doubles x) {
int n = x.size();
double x1 = x[0], x2 = x[0];
for (int i = 1; i < n; ++i) {
x1 = std::min(x1, x[i]);
x2 = std::max(x2, x[i]);
}
writable::doubles out(2);
out[0] = x1;
out[1] = x2;
return out;
}
Write its R equivalent.
To verify the functions, you can run the following tests and benchmark code in the R console:
# install.packages("purrr")
# install.packages("dplyr")
# install.packages("ggplot2")
# install.packages("patchwork")
library(purrr)
library(dplyr)
library(ggplot2)
library(patchwork)
range(x)
range_cpp(x)
# range_r(x)
# create random vectors
set.seed(123) # for reproducibility
bigx <- list(
as.double(rpois(2e6, lambda = 2)),
as.double(rpois(4e6, lambda = 2)),
as.double(rpois(8e6, lambda = 2)),
as.double(rpois(16e6, lambda = 2)),
as.double(rpois(32e6, lambda = 2)),
as.double(rpois(64e6, lambda = 2))
)
results <- map(
bigx,
~ mark(
range(.x),
range_cpp(.x)
# range_r(.x)
) %>%
mutate(n = length(.x))
)
d <- results %>%
bind_rows() %>%
unnest(c(time, mem_alloc, gc, n)) %>%
select(expression, time, mem_alloc, gc, n)
g1 <- ggplot(d, aes(x = n, y = time, color = expression)) +
geom_jitter(width = 0.01, height = 0.01) +
scale_color_viridis_d() +
theme_minimal()
g2 <- ggplot(d, aes(x = n, y = mem_alloc, color = expression)) +
geom_jitter(width = 0.01, height = 0.01) +
scale_color_viridis_d() +
theme_minimal()
g1 / g2