01 - Introduction

For an extended review of the topics discussed in this vignette, please refer to the pre-print cpp4r: A Header-Only C++ and R Interface.

cpp4r is an R package that provides C++11 bindings to R, enabling the use of C++ code in R packages. It is a fork of the cpp11 package (Vaughan, Hester, and François 2025) aiming to provide additional features and improvements while maintaining compatibility with the original cpp11 API.

The R programming language is an open-source version of the S programming language dating back to the 1970s, and even today its internals use C and FORTRAN routines for some tasks (Chambers 2006). This design decision is not arbitrary: C and FORTRAN, while not particularly easy to use, are highly efficient for computationally intensive tasks. Even after optimizing R code through vectorization and avoiding unnecessary object copying, bottlenecks may persist and require compiled language solutions.

For certain applications, such as fitting regression models with a large number of fixed effects (Yotov et al. 2017), computational efficiency can be an issue for any interpreted language, not just R which is in the interpreted languages family. Other examples of interpreted languages are Python, JavaScript and proprietary languages such as MATLAB, Maple, and Wolfram.

C++ is a compiled language (as C and FORTRAN) that offers particular advantages for addressing common R performance bottlenecks, including:

  • Loops that cannot be easily vectorized due to dependencies between iterations
  • Recursive functions or problems requiring many function calls
  • Data structures and algorithms not natively available in R (e.g., R does not let the end-user use pointers and pass-by-reference semantics)
  • Problems requiring fine-tuning memory management

C++, like any compiled language, offers additional computational efficiency, but it involves setup effort that can be considerable in terms of time and configuring the required software stack to compile code and create an executable. R, like other interpreted languages, does not require creating executable files; it can execute instructions line by line, offering greater portability and the ability to introduce small changes to the code without the need to recompile.

Compiled languages do not reduce time complexity, and a direct rewrite of R code in C++ can reduce runtime because C++ typically executes each operation faster. In practical terms, a double loop in R can be as inefficient as a double for loop in C++, with the difference that C++ typically takes less time per operation. For some problems it is possible to derive reduced forms or new algorithms that require a different set of instructions to obtain the same result; an example of this is Vargas Sepulveda (2025), which discusses an alternative formulation to compute Kendall’s correlation coefficient with time complexity \(O(n \log(n))\) instead of \(O(n^2)\) using the traditional algorithm.

Additionally, C++ offers more flexible data structures and allows the direct creation of operator overloads and friend functions that provide flexibility to work with a wide range of data formats (Emara 2024), even ‘esoteric’ binary formats lacking documentation (Sepúlveda and Barkai 2025). While it is possible to write R functions using C, for simplicity it is often more convenient to use C++, as it is a superset of C and supports object-oriented programming rather than purely functional programming (Emara 2024). This facilitates working with matrices, complex numbers, and other data structures such as cubes and fields (Sanderson and Curtin 2016). Choosing C++ over FORTRAN is also often a matter of simplicity.

C++ bindings for R date back to the early 2000s. The cxx package (Hornik 2001), released in 2000 and now discontinued, provided an early prototype of C++ bindings. Rcpp (Eddelbuettel and Francois 2011), first published to CRAN in 2008, is the mainstream solution and used by over 3,000 packages as of 2025.

cpp11 (Vaughan, Hester, and François 2025) was released in 2023 as a complete reimplementation of C++ bindings to R, with different design trade-offs compared to Rcpp aiming to provide:

  • Enforced copy-on-write semantics consistent with R’s behavior
  • Improved safety when interfacing with R’s C API
  • Native support for ALTREP objects
  • UTF-8 string handling throughout
  • Modern C++11 features and idioms
  • Simplified implementation compared to Rcpp
  • Faster compilation with reduced memory requirements
  • Completely header-only design to avoid Application Binary Interface (ABI) compatibility issues

However, cpp11 lacks some Rcpp features that explain its widespread adoption, including:

  • Syntactic sugar (e.g., helper functions that allow for a more concise or convenient way to express common patterns)
  • Modules (e.g., helpers to export C++ functions and/or classes to R more easily)
  • Attributes (e.g., decorators to expose C++ functions to R with minimal boilerplate code)

While using cpp11 for my thesis project, I identified several enhancements that could benefit the R community and improve the library’s usability. These enhancements include:

  • Support for converting C++ maps to R lists
  • Roxygen documentation support directly in C++ code
  • Proper handling of matrix attributes
  • Support for nullable external pointers
  • Immediate availability of values added via push_back()
  • Bidirectional copy of complex number types
  • Flexibility in type conversions
  • Various performance optimizations
  • Specialized codebase to benefit from newer C++ standards (e.g., if C++20 is available, compiling a package will use C++20 features instead of being limited to C++11)

After discussing these proposed enhancements with the cpp11 maintainers, it became clear that the development priorities and timelines would not accommodate these features in the near term. This led to the creation of cpp4r, a fork of cpp11 that incorporates these additional features while maintaining compatibility with the original cpp11 API. This means that cpp4r can serve as a drop-in replacement for cpp11 in any case, allowing users to benefit from the enhancements without significant code changes. The converse, replacing cpp4r with cpp11, requires adjustments due to the additional features in cpp4r.

cpp4r extends cpp11’s container support by enabling seamless conversion between C++ standard library containers and R objects. This includes support for std::map and std::unordered_map containers, which are automatically converted to named R lists.

cpp4r provides roxygen support directly in C++ code, allowing developers to document their C++ functions using familiar roxygen2 syntax. This integration streamlines the documentation workflow for packages that expose C++ functions to R.

Unlike cpp11, cpp4r properly handles matrix attributes, including dimnames, ensuring that matrix operations preserve metadata when copying data between R and C++.

cpp4r offers more flexible type conversion functions. For example, as_integers() and as_doubles() accept logical inputs, providing greater flexibility in handling diverse input types compared to the more restrictive cpp11 implementations.

Several internal optimizations improve performance over cpp11, particularly in vector operations and memory management. These optimizations maintain the safety guarantees of cpp11 while improving execution speed.

cpp4r provides full bidirectional copying of complex numbers, enabling seamless transfer of complex vectors and matrices between R and C++ code. In contrast, cpp11 does not support this functionality.

cpp4r maintains cpp11’s core design principles while extending functionality:

  • Copy-on-Write Semantics: Like cpp11, cpp4r enforces copy-on-write semantics that match R’s behavior, preventing unexpected modifications to input data.
  • Safety First: cpp4r incorporates comprehensive safety mechanisms when interfacing with R’s C API, using unwind_protect() and exception handling to prevent resource leaks.
  • Modern C++ Features: The implementation leverages C++11 features including move semantics, type traits, variadic templates, and user-defined literals.
  • Header-Only Design: As a completely header-only library, cpp4r avoids ABI compatibility issues that can arise with libraries containing compiled components.

cpp4r offers vendoring capabilities, which means copying the dependency code directly into your project’s source tree. This approach, borrowed from the Go programming language, includes the dependencies’ headers with the source code (The Go Authors 2024). This ensures the dependency code remains fixed and stable until explicitly updated. Since cpp4r is a header-only library, you can copy all headers by running cpp4r::vendor_cpp4r() when needed.

Vendoring has both advantages and drawbacks. The main advantage is that disruptive changes to the cpp4r project cannot break your existing code. The drawbacks include slightly larger package size and isolation from bugfixes and new features until you explicitly update the vendored headers. Most packages should not vendor the cpp4r dependency, except for projects designed to run in restricted environments where internet access is limited or unavailable for security reasons (e.g., high-performance computing clusters).

cpp4r is designed as a drop-in replacement for cpp11, using identical syntax and API patterns. Existing cpp11 code can typically be migrated to cpp4r with minimal changes, primarily involving header includes and namespace references.

References

Chambers, John. 2006. “A History of S and R (with Some Questions for the Future).” Vienna, Austria. https://www.r-project.org/conferences/useR-2006/Presentations/index.html.
Eddelbuettel, Dirk, and Romain Francois. 2011. “Rcpp: Seamless R and C++ Integration.” Journal of Statistical Software 40 (April): 1–18. https://doi.org/10.18637/jss.v040.i08.
Emara, Salma. 2024. “Khufu: Object-Oriented Programming Using C++.” https://learningcpp.org/cover.html.
Hornik, Kurt. 2001. “Cxx: C++ Test.” https://cran.r-project.org/src/contrib/Archive/cxx/.
Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.
Sepúlveda, Mauricio Vargas, and Lital Barkai. 2025. “The REDATAM Format and Its Challenges for Data Access and Information Creation in Public Policy.” Data & Policy 7 (January): e18. https://doi.org/10.1017/dap.2025.4.
The Go Authors. 2024. “Go Modules Reference.” https://go.dev/ref/mod.
Vargas Sepulveda, Mauricio. 2025. “Kendallknight: An R Package for Efficient Implementation of Kendall’s Correlation Coefficient Computation.” PLOS ONE 20 (6): e0326090. https://doi.org/10.1371/journal.pone.0326090.
Vaughan, Davis, Jim Hester, and Romain François. 2025. ‘Cpp11‘: A C++11 Interface for R’s c Interface. https://CRAN.R-project.org/package=cpp11.
Yotov, Yoto V., Roberta Piermartini, José-Antonio Monteiro, and Mario Larch. 2017. An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. United Nations. https://doi.org/10.18356/57a768e5-en.

Loading...