09 - Rolling Functions with Missing Values
Note
These functions ignore NA
values for now. Adjustments
for handling NA
values are covered in a separate
vignette.
R already provides efficient versions of the functions covered here. This is just to illustrate how to use C++ code.
Cumulative sum
The following function expands the previous cumsum_cpp()
function to handle missing values.
[[cpp4r::register]]
doubles cumsum2_cpp(doubles x, bool na_rm = false) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
if (na_rm == true) {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = y1 + 0.0;
} else {
if (ISNAN(y1)) {
out[i] = 0.0 + y2;
} else {
out[i] = y1 + y2;
}
}
}
} else {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = NA_REAL;
} else {
if (ISNAN(y1)) {
out[i] = NA_REAL;
} else {
out[i] = y1 + y2;
}
}
}
}
return out;
}
To test the functions, you can run the following benchmark code in the R console:
set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements
cumsum(c(1, NA, 2, 4))
cumsum2_cpp(c(1, NA, 2, 4))
cumsum2_cpp(c(1, NA, 2, 4), na_rm = TRUE)
mark(
cumsum(x),
cumsum2_cpp(x)
)
Cumulative product
The following function expands the previous
cumprod_cpp()
function to handle missing values.
[[cpp4r::register]]
doubles cumprod2_cpp(doubles x, bool na_rm = false) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
if (na_rm == true) {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = y1 * 1.0;
} else {
if (ISNAN(y1)) {
out[i] = 1.0 * y2;
} else {
out[i] = y1 * y2;
}
}
}
} else {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = NA_REAL;
} else {
if (ISNAN(y1)) {
out[i] = NA_REAL;
} else {
out[i] = y1 * y2;
}
}
}
}
return out;
}
To test the functions, you can run the following benchmark code in the R console:
cumprod(c(1, NA, 2, 4))
cumprod2_cpp(c(1, NA, 2, 4))
cumprod2_cpp(c(1, NA, 2, 4), na_rm = TRUE)
mark(
cumprod(x),
cumprod2_cpp(x)
)
Cumulative minimum
The next function calculates the cumulative minimum of the elements of a vector.
[[cpp4r::register]] doubles
cummin_cpp(doubles x, bool na_rm = false) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
if (na_rm == true) {
for (int i = 1; i < n; ++i) {
double y1 = x[i - 1], y2 = x[i];
if (ISNAN(y1)) {
out[i] = y2;
} else {
out[i] = std::min(y1, y2);
}
}
} else {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = NA_REAL;
} else {
if (ISNAN(y1)) {
out[i] = NA_REAL;
} else {
out[i] = std::min(y1, y2);
}
}
}
}
return out;
}
To test the function, you can run the following benchmark code in the R console:
cummin(c(1, NA, 2, 4))
cummin_cpp(c(1, NA, 2, 4))
cummin_cpp(c(1, NA, 2, 4), na_rm = TRUE)
mark(
cummin(x),
cummin_cpp(x)
)
Cumulative maximum
The next function calculates the cumulative maximum of the elements of a vector.
[[cpp4r::register]]
doubles cummax_cpp(doubles x, bool na_rm = false) {
int n = x.size();
writable::doubles out(n);
out[0] = x[0];
if (na_rm == true) {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y1)) {
out[i] = y2;
} else {
out[i] = std::max(y1, y2);
}
}
} else {
for (int i = 1; i < n; ++i) {
double y1 = out[i - 1], y2 = x[i];
if (ISNAN(y2)) {
out[i] = NA_REAL;
} else {
if (ISNAN(y1)) {
out[i] = NA_REAL;
} else {
out[i] = std::max(y1, y2);
}
}
}
}
return out;
}
To test the functions, you can run the following benchmark code in the R console:
cummax(c(1, NA, 2, 4))
cummax_cpp(c(1, NA, 2, 4))
cummax_cpp(c(1, NA, 2, 4), na_rm = TRUE)
mark(
cummax(x),
cummax_cpp(x)
)