Initial setup and dev workflow
The development repository for cpp4r is https://github.com/pachadotdev/cpp4r.
First install any dependencies needed for development.
install.packages("remotes")
remotes::install_deps(dependencies = TRUE)
You can load the package in an interactive R session
devtools::load_all()
Or run the cpp4r tests with
devtools::test()
There are more extensive tests in the cpp4rtest
directory. Generally when developing the C++ headers, you can run R with
your working directory in the cpp4rtest
directory and use
devtools::test()
to run the cpp4rtests.
If you change the cpp4r headers, you will need to install the new version of cpp4r and then clean and recompile the cpp4rtest package:
To calculate code coverage of the cpp4r package run the following
from the cpp4r
root directory.
covr::report(cpp4r_coverage())
Code formatting
This project uses clang-format (version 18) to automatically format the C++ code.
You can run make format
to re-format all code in the
project. If your system does not have clang-format
version
18, you can install it from https://github.com/pachadotdev/clang-format.
Alternatively many IDEs support automatically running
clang-format
every time files are written.
Code organization
cpp4r is a header only library, so all source code exposed to users
lives in inst/include.
R code used to register functions and for
cpp4r::cpp_source()
is in R/. Tests for
only the code in R/
is in tests/testthat/.
The rest of the code is in a separate cpp4rtest/
package included in the source tree. Inside cpp4rtest/src
the files that start with test-
are C++ tests using the Catch
support in testthat. In addition there are some regular R tests in cpp4rtest/tests/testthat/.
Naming conventions
- All header files are named with a
.hpp
extension. - All source files are named with a
.cpp
extension. - Public header files should be put in
inst/include/cpp4r
- Read only r_vector classes and free functions should be put in the
cpp4r
namespace. - Writable r_vector class should be put in the
cpp4r::writable
namespace. - Private classes and functions should be put in the
cpp4r::internal
namespace.
Vector classes
All of the basic r_vector classes are class templates, the base
template is defined in cpp4r/r_vector.hpp.
The template parameter is the type of value the
particular R vector stores, e.g. double
for
cpp4r::doubles
. This differs from Rcpp, whose first
template parameter is the R vector type, e.g. REALSXP
.
The file first has the class declarations, then function definitions further down in the file. Specializations for the various types are in separate files, e.g. cpp4r/doubles.hpp, cpp4r/integers.hpp
Coercion functions
There are two different coercion functions
as_sexp()
takes a C++ object and coerces it to a SEXP
object, so it can be used in R. as_cpp<>()
is a
template function that takes a SEXP and creates a C++ object from it
The various methods for both functions are defined in cpp4r/as.hpp
This is definitely the most complex part of the cpp4r code, with extensive use of template metaprogramming. In particular the substitution failure is not an error (SFINAE) technique is used to control overloading of the functions. If you could use C++20, a lot of this code would be made simpler with Concepts, but alas.
The most common C++ types are included in the test suite and should work without issues, as more exotic types are used in real projects additional issues may arise.
Some useful links on SFINAE
Protection
Protect list
cpp4r uses an idea proposed by Luke Tierney to use a double linked list with the head preserved to protect objects cpp4r is protecting.
Each node in the list uses the head (CAR
) part to point
to the previous node, and the CDR
part to point to the next
node. The TAG
is used to point to the object being
protected. The head and tail of the list have R_NilValue
as
their CAR
and CDR
pointers respectively.
Calling cpp4r::detail::store::insert()
with a regular R
object will add a new node to the list and return a protect token
corresponding to the node added. Calling
cpp4r::detail::store::release()
on this returned token will
release the protection by unlinking the node from the linked list. These
two functions are considered internal to cpp4r, so do not use them in
your packages.
This scheme scales in O(1) time to release or insert an object vs
O(N) or worse time with R_PreserveObject()
/
R_ReleaseObject()
.
Each package has its own unique protection list, which avoids the need to manage a “global” protection list shared across packages. A previous version of cpp4r used a global protection list stored in an R global option, but this caused multiple issues.
These functions are defined in protect.hpp.
Unwind Protect
cpp4r uses R_UnwindProtect()
to protect (most) calls to
the R API that could fail. These are usually those that allocate memory,
though in truth most R API functions could error along some paths. If an
error happens under R_UnwindProtect()
, cpp4r will throw a
C++ exception. This exception is caught by the try/catch block defined
in the BEGIN_cpp4r
macro in cpp4r/declarations.hpp.
The exception will cause any C++ destructors to run, freeing any
resources held by C++ objects. After the try/catch block exits, the R
error unwinding is then continued by R_ContinueUnwind()
and
a normal R error results.
R >=3.5 is required to use cpp4r, but when it was created, the
goal was to support back to R 3.3, but R_ContinueUnwind()
was not available until R 3.5. Below are a few other options that were
considered to support older R versions:
- Using
R_TopLevelExec()
works to avoid the C long jump, but because the code is always run in a top level context any errors or messages thrown cannot be caught bytryCatch()
or similar techniques. - Using
R_TryCatch()
is not available prior to R 3.4, and also has a serious bug in R 3.4 (fixed in R 3.5). - Calling the R level
tryCatch()
function which contains an expression that runs a C function which then runs the C++ code would be an option, but implementing this is convoluted and it would impact performance, perhaps severely. - Have
cpp4r::unwind_protect()
be a no-op for these versions. This means any resources held by C++ objects would leak, includingcpp4r::r_vector
/cpp4r::sexp
objects.
None of these options were perfect. Here are some pros and cons for each:
- Causes behavior changes and test failures, so it was ruled out.
- Was also ruled out since we wanted to support back to R 3.3.
- Was ruled out partially because the implementation would be somewhat tricky and more because performance would suffer greatly.
- Is what was ended up being done before requiring R 3.5. It leaked protected objects when there were R API errors.