01 - Introduction
Motivation
For an extended review of the topics discussed in this vignette, please refer to the pre-print cpp4r: A Header-Only C++ and R Interface.
cpp4r is an R package that provides C++11 bindings to R, enabling the use of C++ code in R packages. It is a fork of the cpp11 package (Vaughan, Hester, and François 2025) aiming to provide additional features and improvements while maintaining compatibility with the original cpp11 API.
The R programming language is an open-source version of the S programming language dating back to the 1970s, and even today its internals use C and FORTRAN routines for some tasks (Chambers 2006). This design decision is not arbitrary: C and FORTRAN, while not particularly easy to use, are highly efficient for computationally intensive tasks. Even after optimizing R code through vectorization and avoiding unnecessary object copying, bottlenecks may persist and require compiled language solutions.
For certain applications, such as fitting regression models with a large number of fixed effects (Yotov et al. 2017), computational efficiency can be an issue for any interpreted language, not just R which is in the interpreted languages family. Other examples of interpreted languages are Python, JavaScript and proprietary languages such as MATLAB, Maple, and Wolfram.
C++ is a compiled language (as C and FORTRAN) that offers particular advantages for addressing common R performance bottlenecks, including:
- Loops that cannot be easily vectorized due to dependencies between iterations
- Recursive functions or problems requiring many function calls
- Data structures and algorithms not natively available in R (e.g., R does not let the end-user use pointers and pass-by-reference semantics)
- Problems requiring fine-tuning memory management
C++, like any compiled language, offers additional computational efficiency, but it involves setup effort that can be considerable in terms of time and configuring the required software stack to compile code and create an executable. R, like other interpreted languages, does not require creating executable files; it can execute instructions line by line, offering greater portability and the ability to introduce small changes to the code without the need to recompile.
Compiled languages do not reduce time complexity, and a direct rewrite of R code in C++ can reduce runtime because C++ typically executes each operation faster. In practical terms, a double loop in R can be as inefficient as a double for loop in C++, with the difference that C++ typically takes less time per operation. For some problems it is possible to derive reduced forms or new algorithms that require a different set of instructions to obtain the same result; an example of this is Vargas Sepulveda (2025), which discusses an alternative formulation to compute Kendall’s correlation coefficient with time complexity \(O(n \log(n))\) instead of \(O(n^2)\) using the traditional algorithm.
Additionally, C++ offers more flexible data structures and allows the direct creation of operator overloads and friend functions that provide flexibility to work with a wide range of data formats (Emara 2024), even ‘esoteric’ binary formats lacking documentation (Sepúlveda and Barkai 2025). While it is possible to write R functions using C, for simplicity it is often more convenient to use C++, as it is a superset of C and supports object-oriented programming rather than purely functional programming (Emara 2024). This facilitates working with matrices, complex numbers, and other data structures such as cubes and fields (Sanderson and Curtin 2016). Choosing C++ over FORTRAN is also often a matter of simplicity.
C++ bindings for R date back to the early 2000s. The cxx package (Hornik 2001), released in 2000 and now discontinued, provided an early prototype of C++ bindings. Rcpp (Eddelbuettel and Francois 2011), first published to CRAN in 2008, is the mainstream solution and used by over 3,000 packages as of 2025.
cpp11 (Vaughan, Hester, and François 2025) was released in 2023 as a complete reimplementation of C++ bindings to R, with different design trade-offs compared to Rcpp aiming to provide:
- Enforced copy-on-write semantics consistent with R’s behavior
- Improved safety when interfacing with R’s C API
- Native support for ALTREP objects
- UTF-8 string handling throughout
- Modern C++11 features and idioms
- Simplified implementation compared to Rcpp
- Faster compilation with reduced memory requirements
- Completely header-only design to avoid Application Binary Interface (ABI) compatibility issues
However, cpp11 lacks some Rcpp features that explain its widespread adoption, including:
- Syntactic sugar (e.g., helper functions that allow for a more concise or convenient way to express common patterns)
- Modules (e.g., helpers to export C++ functions and/or classes to R more easily)
- Attributes (e.g., decorators to expose C++ functions to R with minimal boilerplate code)
While using cpp11 for my thesis project, I identified several enhancements that could benefit the R community and improve the library’s usability. These enhancements include:
- Support for converting C++ maps to R lists
- Roxygen documentation support directly in C++ code
- Proper handling of matrix attributes
- Support for nullable external pointers
- Immediate availability of values added via
push_back() - Bidirectional copy of complex number types
- Flexibility in type conversions
- Various performance optimizations
- Specialized codebase to benefit from newer C++ standards (e.g., if C++20 is available, compiling a package will use C++20 features instead of being limited to C++11)
After discussing these proposed enhancements with the cpp11 maintainers, it became clear that the development priorities and timelines would not accommodate these features in the near term. This led to the creation of cpp4r, a fork of cpp11 that incorporates these additional features while maintaining compatibility with the original cpp11 API. This means that cpp4r can serve as a drop-in replacement for cpp11 in any case, allowing users to benefit from the enhancements without significant code changes. The converse, replacing cpp4r with cpp11, requires adjustments due to the additional features in cpp4r.
cpp4r extends cpp11’s container support by enabling seamless
conversion between C++ standard library containers and R objects. This
includes support for std::map and
std::unordered_map containers, which are automatically
converted to named R lists.
cpp4r provides roxygen support directly in C++ code, allowing developers to document their C++ functions using familiar roxygen2 syntax. This integration streamlines the documentation workflow for packages that expose C++ functions to R.
Unlike cpp11, cpp4r properly handles matrix attributes, including
dimnames, ensuring that matrix operations preserve metadata
when copying data between R and C++.
cpp4r offers more flexible type conversion functions. For example,
as_integers() and as_doubles() accept logical
inputs, providing greater flexibility in handling diverse input types
compared to the more restrictive cpp11 implementations.
Several internal optimizations improve performance over cpp11, particularly in vector operations and memory management. These optimizations maintain the safety guarantees of cpp11 while improving execution speed.
cpp4r provides full bidirectional copying of complex numbers, enabling seamless transfer of complex vectors and matrices between R and C++ code. In contrast, cpp11 does not support this functionality.
cpp4r maintains cpp11’s core design principles while extending functionality:
- Copy-on-Write Semantics: Like cpp11, cpp4r enforces copy-on-write semantics that match R’s behavior, preventing unexpected modifications to input data.
- Safety First: cpp4r incorporates comprehensive safety mechanisms
when interfacing with R’s C API, using
unwind_protect()and exception handling to prevent resource leaks. - Modern C++ Features: The implementation leverages C++11 features including move semantics, type traits, variadic templates, and user-defined literals.
- Header-Only Design: As a completely header-only library, cpp4r avoids ABI compatibility issues that can arise with libraries containing compiled components.
cpp4r offers vendoring capabilities, which means copying the
dependency code directly into your project’s source tree. This approach,
borrowed from the Go programming language, includes the dependencies’
headers with the source code (The Go Authors
2024). This ensures the dependency code remains fixed and stable
until explicitly updated. Since cpp4r is a header-only library, you can
copy all headers by running cpp4r::vendor_cpp4r() when
needed.
Vendoring has both advantages and drawbacks. The main advantage is that disruptive changes to the cpp4r project cannot break your existing code. The drawbacks include slightly larger package size and isolation from bugfixes and new features until you explicitly update the vendored headers. Most packages should not vendor the cpp4r dependency, except for projects designed to run in restricted environments where internet access is limited or unavailable for security reasons (e.g., high-performance computing clusters).
cpp4r is designed as a drop-in replacement for cpp11, using identical syntax and API patterns. Existing cpp11 code can typically be migrated to cpp4r with minimal changes, primarily involving header includes and namespace references.
Differences between R’s C API and cpp4r
R’s C API (Rapi for short) is a low-level interface that lets C and
C++ code create, inspect, and manipulate R objects. It is powerful, but
it requires the developer to manage many details manually: protecting
objects from the garbage collector, unprotecting them in the right
order, encoding strings correctly, checking types, and handling errors
through longjmp/setjmp rather than ordinary
C++ exceptions. cpp4r wraps this API with idiomatic C++ abstractions so
that each of these concerns is handled automatically.
The following examples shows side-by-side comparisons of common tasks
using raw Rapi versus cpp4r. Every cpp4r code snippet shown here is
exercised by the test suite in cpp4rtest/src/. The rest of
the vignettes expand on these examples and provide more details on how
cpp4r simplifies C++ programming for R packages.
Memory protection
The single most error-prone aspect of Rapi programming is the
protection stack. Every newly allocated SEXP must be
explicitly protected against garbage collection with
PROTECT(), and every PROTECT() call must be
matched by an UNPROTECT() call before the function returns.
Counting protects is fragile: adding a branch or an early return can
easily leave the stack unbalanced, causing the infamous “stack
imbalance in .Call” warning or, worse, a silent segfault.
cpp4r’s writable vector types protect themselves on
construction and automatically unprotect on destruction via RAII, so
there is nothing to count. The [[cpp4r::register]]
attribute also handles the .Call boilerplate, including
wrapping the return value in an appropriate SEXP.
Example: Summing a sequence of numbers with Rapi versus cpp4r.
SEXP rapi_sum(SEXP x_sxp) {
int n = Rf_length(x_sxp);
SEXP out = PROTECT(Rf_allocVector(REALSXP, 1)); // must PROTECT
double total = 0;
double* x = REAL(x_sxp);
for (int i = 0; i < n; ++i) total += x[i];
REAL(out)[0] = total;
UNPROTECT(1); // must count manually
return out;
}
[[cpp4r::register]] double cpp4r_sum(cpp4r::doubles x) {
double total = 0;
for (double v : x) total += v;
return total;
}
Creating vectors
Rapi requires calling Rf_allocVector() with the correct
SEXPTYPE constant, then protecting the result before any
further allocation can happen.
SEXP rapi_make_ints() {
SEXP out = PROTECT(Rf_allocVector(INTSXP, 3));
INTEGER(out)[0] = 1;
INTEGER(out)[1] = 2;
INTEGER(out)[2] = 3;
UNPROTECT(1);
return out;
}
cpp4r provides typed writable classes with familiar
container semantics.
[[cpp4r::register]] cpp4r::writable::integers cpp4r_make_ints() {
cpp4r::writable::integers out;
out.push_back(1);
out.push_back(2);
out.push_back(3);
return out;
}
Or with a brace-enclosed initializer list:
[[cpp4r::register]] cpp4r::writable::integers cpp4r_make_ints() {
return {1, 2, 3};
}
The same pattern applies to doubles,
logicals, strings, and raws.
Type checking and coercion
Rapi lets you pass any SEXP anywhere. Checking that the
caller supplied the right type is your responsibility.
SEXP rapi_add_one(SEXP x_sxp) {
if (TYPEOF(x_sxp) != REALSXP) Rf_error("x must be a double vector");
int n = Rf_length(x_sxp);
SEXP out = PROTECT(Rf_allocVector(REALSXP, n));
double* x = REAL(x_sxp);
double* o = REAL(out);
for (int i = 0; i < n; ++i) o[i] = x[i] + 1;
UNPROTECT(1);
return out;
}
cpp4r constructors throw an informative error automatically when the type does not match.
[[cpp4r::register]] cpp4r::writable::doubles cpp4r_add_one(cpp4r::doubles x) {
cpp4r::writable::doubles out(x.size());
for (R_xlen_t i = 0; i < x.size(); ++i) out[i] = x[i] + 1;
return out;
}
For explicit coercion between compatible types, cpp4r provides
as_doubles(), as_integers(), etc., which also
check for loss of precision.
cpp4r::writable::doubles y;
y.push_back(10.00);
cpp4r::writable::integers i(cpp4r::as_integers(y)); // ok: 10.00 -> 10
cpp4r::writable::doubles x;
x.push_back(10.01);
cpp4r::as_integers(x); // throws: 10.01 is not representable as int
Strings
Strings are among the trickiest types in Rapi. A character vector
(STRSXP) is an array of CHARSXP pointers, each
of which carries its own encoding. You must call
SET_STRING_ELT() to set elements (not a plain array
assignment), and you must use CHAR() or
Rf_translateCharUTF8() to read them back.
cpp4r exposes r_string and strings with
std::string-like comparison and iteration, and
SET_STRING_ELT/STRING_ELT are hidden behind
ordinary subscript operators.
Example: Print a greeting for each name in a character vector with Rapi versus cpp4r.
SEXP rapi_greet(SEXP names_sxp) {
int n = Rf_length(names_sxp);
SEXP out = PROTECT(Rf_allocVector(STRSXP, n));
for (int i = 0; i < n; ++i) {
const char* name = Rf_translateCharUTF8(STRING_ELT(names_sxp, i));
char buf[256];
snprintf(buf, sizeof(buf), "Hello, %s!", name);
SET_STRING_ELT(out, i, Rf_mkCharCE(buf, CE_UTF8));
}
UNPROTECT(1);
return out;
}
[[cpp4r::register]]
cpp4r::writable::strings cpp4r_greet(cpp4r::strings names) {
cpp4r::writable::strings out(names.size());
for (R_xlen_t i = 0; i < names.size(); ++i) {
out[i] = "Hello, " + std::string(names[i]) + "!";
}
return out;
}
String construction and comparison work without encoding bookkeeping:
cpp4r::writable::strings x;
x.push_back("a");
x.push_back("b");
// x[0] == "a" works directly
Missing values (NA)
Each vector type in R has its own sentinel for missing values
(NA_INTEGER, NA_REAL, NA_LOGICAL,
NA_STRING, NA_COMPLEX). In Rapi you test for
them differently per type, which is easy to get wrong.
// doubles: use ISNA() or ISNAN()
if (ISNA(REAL(x)[i])) { ... }
// integers: compare to NA_INTEGER directly
if (INTEGER(x)[i] == NA_INTEGER) { ... }
// strings: compare to NA_STRING
if (STRING_ELT(x, i) == NA_STRING) { ... }
cpp4r provides a uniform cpp4r::is_na() template and
cpp4r::na<T>() constructor.
// Works the same way for all vector element types
if (cpp4r::is_na(x[i])) { ... }
// Create typed NAs
int int_na = cpp4r::na<int>(); // == NA_INTEGER
double dbl_na = cpp4r::na<double>(); // ISNA() == true
cpp4r::r_bool bool_na = cpp4r::na<cpp4r::r_bool>(); // == NA_LOGICAL
cpp4r::r_string str_na = cpp4r::na<cpp4r::r_string>(); // == NA_STRING
Error handling
Rf_error() in Rapi works by calling
longjmp(), which unwinds the C stack without invoking C++
destructors. This means any RAII cleanup — including destructors of
std::vector, std::string, file handles, etc. —
is silently skipped when an R error occurs inside your code.
SEXP rapi_bad(SEXP x_sxp) {
std::vector<double> tmp(1000); // destructor will NOT run on R error
// ...
Rf_error("something went wrong"); // longjmp — tmp leaks
return R_NilValue;
}
cpp4r’s cpp4r::stop() and
cpp4r::unwind_protect() use R’s unwind API to translate R
errors into C++ exceptions, so destructors run correctly.
[[cpp4r::register]] void cpp4r_safe_example(cpp4r::doubles x) {
std::vector<double> tmp(x.size()); // destructor WILL run
// ...
cpp4r::stop("something went wrong"); // throws unwind_exception; tmp cleaned up
}
cpp4r::safe[] is a convenience wrapper for calling Rapi
functions that may throw R errors from within protected contexts:
SEXP out = cpp4r::safe[Rf_allocVector](REALSXP, 1);
// If Rf_allocVector triggers an R error, it is caught and re-thrown as
// a C++ exception rather than longjmp-ing past your destructors.
Lists
Building a named list with Rapi requires multiple
PROTECT calls, manual name assignment via a character
vector, and careful indexing.
cpp4r uses named-argument literals and brace initializers.
Example: Create a list with Rapi versus cpp4r.
SEXP rapi_make_list() {
SEXP out = PROTECT(Rf_allocVector(VECSXP, 2));
SEXP names = PROTECT(Rf_allocVector(STRSXP, 2));
SET_STRING_ELT(names, 0, Rf_mkChar("x"));
SET_STRING_ELT(names, 1, Rf_mkChar("y"));
SEXP x = PROTECT(Rf_allocVector(REALSXP, 1));
REAL(x)[0] = 1.0;
SET_VECTOR_ELT(out, 0, x);
SEXP y = PROTECT(Rf_allocVector(INTSXP, 2));
INTEGER(y)[0] = 3;
INTEGER(y)[1] = 4;
SET_VECTOR_ELT(out, 1, y);
Rf_setAttrib(out, R_NamesSymbol, names);
UNPROTECT(4);
return out;
}
[[cpp4r::register]] cpp4r::writable::list cpp4r_make_list() {
using namespace cpp4r::literals;
return {
"x"_nm = {1.0},
"y"_nm = {3, 4},
};
}
Pushing elements onto an existing list is equally straightforward:
cpp4r::writable::list x;
x.push_back(cpp4r::writable::doubles({1.}));
x.push_back(cpp4r::writable::integers({3, 4, 5}));
x.push_back(cpp4r::writable::strings({"foo", "bar"}));
Calling R functions from C++
Calling an R function from Rapi requires finding it in the right environment, building a pairlist of arguments, evaluating a language object, and protecting intermediates throughout.
cpp4r uses cpp4r::package() to look up exported
functions, which can then be called with ordinary C++ function-call
syntax.
Example: Call median(x, na.rm = TRUE) from Rapi versus
cpp4r.
SEXP rapi_call_median(SEXP x_sxp) {
SEXP fn = PROTECT(Rf_findFun(Rf_install("median"),
R_FindNamespace(Rf_mkString("stats"))));
SEXP args = PROTECT(Rf_lang3(fn, x_sxp, Rf_ScalarLogical(1)));
int err = 0;
SEXP out = PROTECT(R_tryEval(args, R_GlobalEnv, &err));
if (err) Rf_error("median() failed");
UNPROTECT(3);
return out;
}
[[cpp4r::register]] double cpp4r_call_median(cpp4r::doubles x) {
auto median = cpp4r::package("stats")["median"];
return median(x, true); // na.rm = true
}
Named arguments work with the _nm literal:
using namespace cpp4r::literals;
double res = median("x"_nm = x, "na.rm"_nm = true);
Environments
Reading and writing variables in an R environment with Rapi requires
converting names to SYMSXPs with
Rf_install().
cpp4r wraps environments with subscript operators.
Example: Set and get a variable in an environment with Rapi versus cpp4r.
void rapi_set_env(SEXP env_sxp, const char* name, double value) {
SEXP sym = Rf_install(name);
SEXP val = PROTECT(Rf_ScalarReal(value));
Rf_defineVar(sym, val, env_sxp);
UNPROTECT(1);
}
double rapi_get_env(SEXP env_sxp, const char* name) {
SEXP sym = Rf_install(name);
SEXP val = Rf_findVarInFrame(env_sxp, sym);
if (val == R_UnboundValue) Rf_error("'%s' not found", name);
return REAL(val)[0];
}
[[cpp4r::register]] void cpp4r_env_roundtrip(SEXP env_sxp) {
cpp4r::environment env(env_sxp);
env["foo"] = 1; // set
int v = cpp4r::as_cpp<int>(env["foo"]); // get -> 1
env.remove("foo");
}
External pointers
Wrapping a C++ object as an R external pointer with Rapi requires writing a custom finalizer function, registering it, and carefully managing the pointer lifetime.
cpp4r provides a templated external_pointer<T>
that owns the pointer and runs the deleter automatically when the R
object is garbage collected.
Example: Create an external pointer to a
std::vector<int> with Rapi versus cpp4r.
static void my_finalizer(SEXP ptr_sxp) {
if (R_ExternalPtrAddr(ptr_sxp)) {
delete static_cast<std::vector<int>*>(R_ExternalPtrAddr(ptr_sxp));
R_ClearExternalPtr(ptr_sxp);
}
}
SEXP rapi_make_extptr() {
std::vector<int>* v = new std::vector<int>{1, 2};
SEXP ptr = PROTECT(R_MakeExternalPtr(v, R_NilValue, R_NilValue));
R_RegisterCFinalizerEx(ptr, my_finalizer, TRUE);
UNPROTECT(1);
return ptr;
}
[[cpp4r::register]]
cpp4r::external_pointer<std::vector<int>> cpp4r_make_extptr() {
auto* v = new std::vector<int>{1, 2};
return cpp4r::external_pointer<std::vector<int>>(v);
}
Custom deleters are supported as a non-type template parameter:
void my_deleter(int* p) { delete p; }
cpp4r::external_pointer<int, my_deleter> ptr(new int(42));
ptr.reset(); // deleter called here
Matrices
Rapi has no matrix type distinct from vectors; a matrix is a vector
with a dim attribute. Setting and reading dim
must be done manually.
cpp4r provides doubles_matrix<by_row> and
doubles_matrix<by_column> that expose rows or columns
as slices and keep nrow() / ncol() in sync
with the dim attribute automatically.
Example: Create a matrix with Rapi versus cpp4r.
SEXP rapi_make_matrix(int nrow, int ncol) {
SEXP out = PROTECT(Rf_allocMatrix(REALSXP, nrow, ncol));
// Column-major: element [i, j] is at index i + nrow * j
for (int j = 0; j < ncol; ++j)
for (int i = 0; i < nrow; ++i)
REAL(out)[i + nrow * j] = (double)(i * ncol + j);
UNPROTECT(1);
return out;
}
[[cpp4r::register]]
cpp4r::writable::doubles_matrix<cpp4r::by_row> cpp4r_make_matrix() {
cpp4r::writable::doubles_matrix<cpp4r::by_row> out(5, 2);
// out.nrow() == 5, out.ncol() == 2
for (R_xlen_t i = 0; i < out.nrow(); ++i)
for (R_xlen_t j = 0; j < out.ncol(); ++j)
out(i, j) = static_cast<double>(i * out.ncol() + j);
return out;
}
Row or column slices can be iterated directly:
cpp4r::doubles_matrix<cpp4r::by_row> m(sexp);
auto row0 = m[0]; // a slice of the first row
Summary
The table below summarises the main improvements cpp4r provides over direct Rapi use.
| Topic | Rapi | cpp4r |
|---|---|---|
| Memory protection | Manual PROTECT/UNPROTECT, must count |
Automatic via RAII |
| Type checking | Manual TYPEOF() guard |
Constructor throws on mismatch |
| Coercion | Rf_coerceVector(), no precision check |
as_doubles() / as_integers() with
precision check |
| String elements | SET_STRING_ELT / CHAR() / encoding
flags |
Subscript operators, std::string interop |
| Missing values | Different sentinel per type | Uniform is_na() / na<T>() |
| Error signalling | longjmp, skips C++ destructors |
cpp4r::stop() throws, destructors run |
| Lists | Manual SET_VECTOR_ELT + names vector |
push_back(), _nm literals, brace
initializers |
| Calling R functions | Pairlist + Rf_eval |
cpp4r::package("pkg")["fn"](args...) |
| Environments | Rf_install() + Rf_defineVar() |
Subscript operators on cpp4r::environment |
| External pointers | Custom finalizer + R_RegisterCFinalizerEx |
external_pointer<T>, automatic deletion |
| Matrices | Raw vector + manual dim attribute |
doubles_matrix<by_row/by_column> |