Logical Functions

Motivation

This chapter covers the implementation of simple logical functions in C++ and R. The goal is to show the syntax differences between the two languages and compare their performance. These examples were adapted from Vaughan, Hester, and Francois (2024).

Fair Warning

These functions ignore NA values. Adjustments for handling NA values will be introduced in the sixth chapter.

R already provides efficient versions of the functions covered here. Code optimizations and improvements will be made in later chapters.

Load the Package

I loaded the ece244 package as I added the functions from the next sections to it with the following code:

load_all()

Additional Packages

I used the bench package to compare the performance of the functions. The package was loaded with the following code:

library(bench)

Are Some Values True? (any())

The any() function returns TRUE if there is at least one TRUE element in a vector, and FALSE otherwise. Below is one possible C++ implementation:

[[cpp11::register]] bool any_cpp_(logicals x) {
  int n = x.size();
  
  for (int i = 0; i < n; ++i) {
    if (x[i]) {
      return true;
    }
  }
  return false;
}

Its R equivalent is:

#' Return TRUE if any element in a vector is TRUE (R)
#' @param x logical vector
#' @export
any_r <- function(x) {
  n <- length(x)
  
  for (i in 1:n) {
    if (x[i]) {
      return(TRUE)
    }
  }
  FALSE
}

To document the C++ function, I added the following wrapper to the R code:

#' Return TRUE if any element in a vector is TRUE (C++)
#' @inheritParams any_r
#' @export
any_cpp <- function(x) {
  any_cpp_(x)
}

To test the functions, I ran the following benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements
y <- ifelse(x > 2, TRUE, FALSE)

any(y)
[1] TRUE
any_cpp(y)
[1] TRUE
any_r(y)
[1] TRUE
mark(
  any(y),
  any_cpp(y),
  any_r(y)
)
# A tibble: 3 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 any(y)        121ns    133ns  5039534.        0B    504. 
2 any_cpp(y)    865ns    958ns   929835.        0B     93.0
3 any_r(y)      444ns    535ns  1589559.    19.5KB      0  

Which Indices are TRUE? (which())

The which() function returns the indices of the TRUE elements in a vector. Here is a possible C++ implementation:

[[cpp11::register]] integers which_cpp_(logicals x) {
  int n = x.size();
  writable::integers res;
  int j = 0;

  for (int i = 0; i < n; ++i) {
    if (x[i]) {
      ++j;
      res.push_back(i + 1);
    }
  }

  if (j == 0) {
    return integers(0);
  } else {
    return res;
  }
}

Its R equivalent is:

#' Return the indexes of the TRUE elements in a vector (R)
#' @param x vector of values
#' @export
which_r <- function(x) {
  n <- length(x)
  res <- c()
  j <- 0

  for (i in 1:n) {
    if (x[i]) {
      res <- c(res, i)
      j <- j + 1
    }
  }

  if (j == 0) {
    return(0)
  } else {
    return(res)
  }
}

To document the C++ function, I added the following wrapper to the R code:

#' Return the index of the TRUE elements in a vector (C++)
#' @inheritParams which_r
#' @export
which_cpp <- function(x) {
  which_cpp_(x)
}

To test the functions, I ran the following benchmark code in the R console:

which(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
which_cpp(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
which_r(y[1:100])
 [1]  2  4  5  8 11 13 16 20 21 22 24 26 31 32 33 34 37 50 53 58 59 65 67 68 69
[26] 71 73 84 87 88 89 97
mark(
  which(y[1:1000]),
  which_cpp(y[1:1000]),
  which_r(y[1:1000])
)
# A tibble: 3 × 6
  expression                min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>           <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 which(y[1:1000])       3.71µs   4.41µs   191852.    13.2KB     57.6
2 which_cpp(y[1:1000])  15.55µs  17.04µs    55640.    13.1KB     16.7
3 which_r(y[1:1000])   115.79µs 131.37µs     7159.   250.9KB     36.6

Are All Values True? (all())

The all() function checks if all elements in a vector are TRUE. Here is a possible C++ implementation that loops over the vector:

[[cpp11::register]] bool all_cpp_1_(logicals x) {
  int n = x.size();
  for (int i = 0; i < n; ++i) {
    if (!x[i]) {
      return false;
    }
  }
  return true;
}

More concise C++ alternatives are:

[[cpp11::register]] bool all_cpp_2_(logicals x) {
  for (int i = 0; i < x.size(); ++i) {
    if (!x[i]) {
      return false;
    }
  }
  return true;
}

[[cpp11::register]] bool all_cpp_3_(logicals x) {
  for (bool i : x) {
    if (!i) {
      return false;
    }
  }
  return true;
}

[[cpp11::register]] bool all_cpp_4_(logicals x) {
  return std::all_of(x.begin(), x.end(), [](bool x) { return x; });
}

To avoid typing std:: every time, you can use using namespace std; at the top of src/code.cpp. However, this is not recommended because it can lead to conflicts. A better option is to declare using std::the_function; which means you can use the_function instead of std::the_function each time (Akbiggs 2024).

To test the functions, I ran the following tests and benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements

all(x > 2)
[1] FALSE
all_cpp_1_(x > 2)
[1] FALSE
all_cpp_2_(x > 2)
[1] FALSE
all_cpp_3_(x > 2)
[1] FALSE
all_cpp_4_(x > 2)
[1] FALSE
# also test the TRUE-only case
all(x >= 0)
[1] TRUE
all_cpp_1_(x >= 0)
[1] TRUE
all_cpp_2_(x >= 0)
[1] TRUE
all_cpp_3_(x >= 0)
[1] TRUE
all_cpp_4_(x >= 0)
[1] TRUE
mark(
  all(x > 2),
  all_cpp_1_(x > 2),
  all_cpp_2_(x > 2),
  all_cpp_3_(x > 2),
  all_cpp_4_(x > 2)
)
# A tibble: 5 × 6
  expression             min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>        <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 all(x > 2)           2.2ms   2.41ms      411.    3.81MB     57.0
2 all_cpp_1_(x > 2)   2.33ms   2.43ms      410.    3.81MB     58.5
3 all_cpp_2_(x > 2)   2.32ms   2.43ms      409.    3.81MB     64.4
4 all_cpp_3_(x > 2)   2.29ms   2.47ms      402.    3.81MB     59.0
5 all_cpp_4_(x > 2)   2.32ms   2.45ms      404.    3.81MB     58.5

References

Akbiggs. 2024. “What’s the Problem with "Using Namespace Std;"?” Forum post. Stack Overflow. https://stackoverflow.com/q/1452721/3720258.
Vaughan, Davis, Jim Hester, and Roman Francois. 2024. “Get Started with Cpp11.” https://cpp11.r-lib.org/articles/cpp11.html#intro.