01 - Motivations for cpp4r
Motivation and significance
The R programming language has maintained a long-standing tradition of interfacing with compiled languages, dating back to the original S implementation in the late 1970s, which served primarily as a wrapper around FORTRAN routines (Chambers 2006). This integration remains relevant today, as R code sometimes lacks the performance needed for computationally intensive tasks. Even after optimizing R code through vectorization and avoiding unnecessary object copying, bottlenecks may persist that require compiled language solutions.
C++ offers particular advantages for addressing common R performance bottlenecks, including:
- Loops that cannot be easily vectorized due to dependencies between iterations
- Recursive functions or problems requiring many function calls
- Data structures and algorithms not natively available in R (e.g., R does not let the end-user use pointers and pass-by-reference semantics)
- Problems requiring fine-tuning memory management
cpp4r is an R package that provides C++11 bindings to R, enabling the use of C++ code in R packages. It is a fork of the cpp11 package (Vaughan, Hester, and François 2025) aiming to provide additional features and improvements while maintaining compatibility with the original cpp11 API.
The landscape of C++ bindings for R has evolved significantly over the past two decades. The cxx package, released in 2000, provided an early prototype of C++ bindings (Hornik 2001). Rcpp, first published to CRAN in 2008, became the mainstream solution with over 2,000 reverse dependencies by 2020 (Eddelbuettel and Francois 2011). A subsequent attempt, Rcpp11, was released in 2014 but did not achieve widespread adoption (Francois, Ushey, and Chambers 2020).
While Rcpp has been highly successful, adding modern C++ features or addressing certain architectural issues would require substantial breaking changes. These changes would compromise backward compatibility with the extensive ecosystem of dependent packages.
To address these limitations, the cpp11 package (Vaughan, Hester, and François 2023) was released in 2023 as a complete reimplementation of C++ bindings to R, incorporating modern C++ features and different design trade-offs aiming to provide:
- Enforced copy-on-write semantics consistent with R’s behavior
- Improved safety when interfacing with R’s C API
- Native support for ALTREP objects
- UTF-8 string handling throughout
- Modern C++11 features and idioms
- Simplified implementation compared to Rcpp
- Faster compilation with reduced memory requirements
- Completely header-only design to avoid Application Binary Interface (ABI) compatibility issues
Software description
While using cpp11 for my thesis project, I identified several enhancements that could benefit the R community and improve the library’s usability. These enhancements include:
- Support for converting C++ maps to R lists
- Roxygen documentation support directly in C++ code
- Proper handling of matrix attributes
- Support for nullable external pointers
- Immediate availability of values added via
push_back()
- Bidirectional copy of complex number types
- Flexibility in type conversions
- Various performance optimizations
After discussing these proposed enhancements with the cpp11 maintainers, it became clear that the development priorities and timelines would not accommodate these features in the near term. This led to the creation of cpp4r, a fork of cpp11 that incorporates these additional features while maintaining compatibility with the original cpp11 API. This means that cpp4r can serve as a drop-in replacement for cpp11 in any case, allowing users to benefit from the enhancements without significant code changes. The converse, replacing cpp4r with cpp11, requires adjustments due to the additional features in cpp4r.
cpp4r extends cpp11’s container support by enabling seamless
conversion between C++ standard library containers and R objects. This
includes support for std::map
and
std::unordered_map
containers, which are automatically
converted to named R lists.
cpp4r provides roxygen support directly in C++ code, allowing developers to document their C++ functions using familiar roxygen2 syntax. This integration streamlines the documentation workflow for packages that expose C++ functions to R.
Unlike cpp11, cpp4r properly handles matrix attributes, including
dimnames
, ensuring that matrix operations preserve metadata
when copying data between R and C++.
cpp4r offers more flexible type conversion functions. For example,
as_integers()
and as_doubles()
accept logical
inputs, providing greater flexibility in handling diverse input types
compared to the more restrictive cpp11 implementations.
Several internal optimizations improve performance over cpp11, particularly in vector operations and memory management. These optimizations maintain the safety guarantees of cpp11 while improving execution speed.
cpp4r provides full bidirectional copying of complex numbers, enabling seamless transfer of complex vectors and matrices between R and C++ code. In contrast, cpp11 does not support this functionality.
cpp4r maintains cpp11’s core design principles while extending functionality:
- Copy-on-Write Semantics: Like cpp11, cpp4r enforces copy-on-write semantics that match R’s behavior, preventing unexpected modifications to input data.
- Safety First: cpp4r incorporates comprehensive safety mechanisms
when interfacing with R’s C API, using
unwind_protect()
and exception handling to prevent resource leaks. - Modern C++ Features: The implementation leverages C++11 features including move semantics, type traits, variadic templates, and user-defined literals.
- Header-Only Design: As a completely header-only library, cpp4r avoids ABI compatibility issues that can arise with libraries containing compiled components.
cpp4r offers vendoring capabilities, which means copying the
dependency code directly into your project’s source tree. This approach,
borrowed from the Go programming language, includes the dependencies’
headers with the source code (The Go Authors
2024). This ensures the dependency code remains fixed and stable
until explicitly updated. Since cpp4r is a header-only library, you can
copy all headers by running cpp4r::vendor_cpp4r()
when
needed.
Vendoring has both advantages and drawbacks. The main advantage is that disruptive changes to the cpp4r project cannot break your existing code. The drawbacks include slightly larger package size and isolation from bugfixes and new features until you explicitly update the vendored headers. Most packages should not vendor the cpp4r dependency, except for projects designed to run in restricted environments where internet access is limited or unavailable for security reasons (e.g., high-performance computing clusters).
Software functionalities
cpp4r is designed as a drop-in replacement for cpp11, using identical syntax and API patterns. Existing cpp11 code can typically be migrated to cpp4r with minimal changes, primarily involving header includes and namespace references.
To use cpp4r, users must first install the package from CRAN or GitHub. The following code shows how to install the package:
install.packages("cpp4r", repos = "https://cran.rstudio.com")
# or
remotes::install_github("pachadotdev/cpp4r")
Once installed, users can use the provided package template function to create a new package that uses C++ code. The package template includes simple examples and all the necessary files to compile the code and install the new R package. The following code shows how to create a new package:
cpp4r::pkg_template("~/rstats/mypkg")
The package skeleton includes standard practices:
# In DESCRIPTION file
LinkingTo: cpp4r
# In R code
#' @useDynLib mypkg, .registration = TRUE
#' @keywords internal
"_PACKAGE"
C++ functions are exposed to R using the attribute syntax and documented with roxygen comments:
/* roxygen
@title Square of Each Element in 'x'
@param x Numeric vector
@return Numeric vector
@export
*/
[[cpp4r::register]]
cpp4r::doubles my_square(cpp4r::doubles x) {
return x * x;
}
The equivalent cpp11 code would be:
[[cpp11::register]]
cpp11::doubles my_square_cpp(cpp11::doubles x) {
return x * x;
}
#' @title Square of Each Element in 'x'
#' @param x Numeric vector
#' @return Numeric vector
#' @export
my_square <- function(x) {
my_square_cpp(x)
}
Illustrative examples
cpp4r uses lists to track managed objects. This approach is more
efficient for large numbers of objects than Rcpp’s use of
R_PreserveObject()
/R_ReleaseObjects()
.
When vector or matrix sizes are known beforehand, the performance difference between cpp4r/cpp11 and Rcpp is negligible. However, when the length is unknown beforehand, performance changes notably. This is the case with rejection sampling algorithms, which obtain \(n\) accepted samples without knowing in advance how many candidates need to be generated.
The C++ push_back()
method is ideal for rejection
sampling, where each candidate is either accepted (stored) or rejected
(discarded). With low acceptance rates, we might need to generate \(kn\) candidates (\(k > 1\)) to obtain \(n\) final samples. Unlike Gibbs sampling
algorithms, where iterations are known upfront, rejection sampling
requires dynamically growing vectors or matrices. For example, with an
80% acceptance rate, we need to generate approximately \(1.25n\) samples to obtain \(n\) final samples.
Rejection sampling is used in Monte Carlo methods, Bayesian
inference, and simulation studies where cpp4r’s design is advantageous
because it reserves extra memory, making push_back()
operations have \(O(1)\) time
complexity. In contrast, Rcpp does not reserve extra capacity, and its
push_back()
operations have \(O(n)\) time complexity, leading to
quadratic memory usage patterns. This design difference translates to
performance differences with larger input data.
The following code shows a rejection sampling implementation with lower and upper truncation bounds using cpp4r. The syntax is equivalent for cpp11 and Rcpp, but performance differs:
// Reproducible examples via set.seed() in R
class local_rng {
public:
local_rng() { GetRNGstate(); }
~local_rng() { PutRNGstate(); }
};
[[cpp4r::register]]
doubles rejection_sampling(int n_samples, double mu = 0.0, double sigma = 1.0,
double lower = -2.0, double upper = 2.0) {
local_rng rng_state;
// Acceptance rate for better initial allocation
double z_lower = (lower - mu) / sigma, z_upper = (upper - mu) / sigma;
double acceptance_rate = Rf_pnorm5(z_upper, 0.0, 1.0, 1, 0) -
Rf_pnorm5(z_lower, 0.0, 1.0, 1, 0);
// Allocate based on expected number of samples needed
// (add 20% to ensure minimum size)
R_xlen_t estimated_needed = static_cast<R_xlen_t>(n_samples / acceptance_rate * 1.2);
estimated_needed = std::max(estimated_needed, static_cast<R_xlen_t>(n_samples));
writable::doubles accepted_samples;
accepted_samples.reserve(estimated_needed);
// Keep sampling until we have enough accepted samples
int target_samples = static_cast<int>(n_samples);
while (static_cast<int>(accepted_samples.size()) < target_samples) {
double candidate = Rf_rnorm(mu, sigma);
if (candidate >= lower && candidate <= upper) {
accepted_samples.push_back(candidate);
}
}
return accepted_samples;
}
The following table shows the speed quantiles for the same rejection sampling algorithm implemented with cpp4r, cpp11, and Rcpp:
Backend | Sample size | Median speed | Cumulative memory usage |
---|---|---|---|
cpp4r | 25,000 | 808.53µs | 468.96KB |
50,000 | 2.17ms | 931.76KB | |
75,000 | 2.75ms | 1.36MB | |
100,000 | 3.57ms | 1.82MB | |
cpp11 | 25,000 | 806.16µs | 714.71KB |
50,000 | 2.6ms | 1.38MB | |
75,000 | 2.77ms | 2.57MB | |
100,000 | 3.98ms | 2.76MB | |
Rcpp | 25,000 | 420.98ms | 2.33GB |
50,000 | 1.52s | 9.32GB | |
75,000 | 4.04s | 20.96GB | |
100,000 | 6.01s | 37.26GB |
The benchmark shows that both cpp4r and cpp11 scale better than Rcpp
when making repeated push_back()
calls. With Rcpp, the
entire vector must be copied on each call. In contrast, cpp4r vectors
grow efficiently by reserving extra space, similar to
std::vector
.
The benchmark followed the guidelines from Beyer, Löwe, and Wendler (2019). It was conducted on a Lenovo ThinkPad X1 Carbon Gen 9 laptop equipped with an 11th Gen Intel Core i7-1185G7 processor (8 cores, 3.00GHz), 15.3 GiB of RAM and Manjaro Linux operating system.
Impact
By providing a portable interface for C++ integration, cpp4r enables R developers to leverage the power of modern C++ while minimizing the complexity typically associated with writing compiled code. The benchmarks demonstrate substantial speed and memory usage improvements compared to existing implementations, providing researchers with a useful tool to write R packages to analyze large datasets and intensive computations.
Beyond improved memory usage, which can determine whether an analysis is feasible, cpp4r’s design also offers faster compilation times, enhancing development workflow and testing. This is particularly beneficial in academic and business settings, where prototyping and iteration are common before final implementation.
cpp4r represents an evolution in C++ bindings for R, building upon the solid foundation established by cpp11 while addressing specific limitations and adding features that can benefit the R community. By maintaining API compatibility with cpp11, cpp4r provides a migration path for developers seeking enhanced functionality without requiring significant code restructuring.
Conclusion
The development of cpp4r demonstrates the value of open-source transparency and how it enables derived works. While cpp11 and Rcpp continue to serve the broader R community effectively, cpp4r offers an alternative for projects requiring its specific enhancements, particularly in academic research contexts where documentation features and additional data types support reduce the coding effort for advancing computational methods.
The software, documentation, and replication code are available on GitHub. The codebase is released under the Apache 2.0 license. Contributions and feedback from the R community are welcome to help improve the package further. We hope that cpp4r will be a valuable tool for R developers, providing a simple solution for performance bottlenecks by integrating C++ code into R packages.