09 - Rolling Functions with Missing Values

Notes

These functions ignore NA values for now. Adjustments for handling NA values are covered in a separate vignette.
R already provides efficient versions of the functions covered here. This is just to illustrate how to use C++ code.

Cumulative sum

The following function expands the previous cumsum_cpp() function to handle missing values.

[[cpp4r::register]]
doubles cumsum2_cpp(doubles x, bool na_rm = false) {
  int n = x.size();

  writable::doubles out(n);
  out[0] = x[0];

  if (na_rm == true) {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = y1 + 0.0;
      } else {
        if (ISNAN(y1)) {
          out[i] = 0.0 + y2;
        } else {
          out[i] = y1 + y2;
        }
      }
    }
  } else {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = NA_REAL;
      } else {
        if (ISNAN(y1)) {
          out[i] = NA_REAL;
        } else {
          out[i] = y1 + y2;
        }
      }
    }
  }

  return out;
}

To test the functions, you can run the following benchmark code in the R console:

set.seed(123) # for reproducibility
x <- rpois(1e6, lambda = 2) # 1,000,000 elements

cumsum(c(1, NA, 2, 4))
cumsum2_cpp(c(1, NA, 2, 4))
cumsum2_cpp(c(1, NA, 2, 4), na_rm = TRUE)

mark(
  cumsum(x),
  cumsum2_cpp(x)
)

Cumulative product

The following function expands the previous cumprod_cpp() function to handle missing values.

[[cpp4r::register]]
doubles cumprod2_cpp(doubles x, bool na_rm = false) {
  int n = x.size();

  writable::doubles out(n);
  out[0] = x[0];

  if (na_rm == true) {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = y1 * 1.0;
      } else {
        if (ISNAN(y1)) {
          out[i] = 1.0 * y2;
        } else {
          out[i] = y1 * y2;
        }
      }
    }
  } else {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = NA_REAL;
      } else {
        if (ISNAN(y1)) {
          out[i] = NA_REAL;
        } else {
          out[i] = y1 * y2;
        }
      }
    }
  }

  return out;
}

To test the functions, you can run the following benchmark code in the R console:

cumprod(c(1, NA, 2, 4))
cumprod2_cpp(c(1, NA, 2, 4))
cumprod2_cpp(c(1, NA, 2, 4), na_rm = TRUE)

mark(
  cumprod(x),
  cumprod2_cpp(x)
)

Cumulative minimum

The next function calculates the cumulative minimum of the elements of a vector.

[[cpp4r::register]] doubles
cummin_cpp(doubles x, bool na_rm = false) {
  int n = x.size();

  writable::doubles out(n);
  out[0] = x[0];

  if (na_rm == true) {
    for (int i = 1; i < n; ++i) {
      double y1 = x[i - 1], y2 = x[i];
      if (ISNAN(y1)) {
        out[i] = y2;
      } else {
        out[i] = std::min(y1, y2);
      }
    }
  } else {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = NA_REAL;
      } else {
        if (ISNAN(y1)) {
          out[i] = NA_REAL;
        } else {
          out[i] = std::min(y1, y2);
        }
      }
    }
  }

  return out;
}

To test the function, you can run the following benchmark code in the R console:

cummin(c(1, NA, 2, 4))
cummin_cpp(c(1, NA, 2, 4))
cummin_cpp(c(1, NA, 2, 4), na_rm = TRUE)

mark(
  cummin(x),
  cummin_cpp(x)
)

Cumulative maximum

The next function calculates the cumulative maximum of the elements of a vector.

[[cpp4r::register]]
doubles cummax_cpp(doubles x, bool na_rm = false) {
  int n = x.size();

  writable::doubles out(n);
  out[0] = x[0];

  if (na_rm == true) {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y1)) {
        out[i] = y2;
      } else {
        out[i] = std::max(y1, y2);
      }
    }
  } else {
    for (int i = 1; i < n; ++i) {
      double y1 = out[i - 1], y2 = x[i];
      if (ISNAN(y2)) {
        out[i] = NA_REAL;
      } else {
        if (ISNAN(y1)) {
          out[i] = NA_REAL;
        } else {
          out[i] = std::max(y1, y2);
        }
      }
    }
  }

  return out;
}

To test the functions, you can run the following benchmark code in the R console:

cummax(c(1, NA, 2, 4))
cummax_cpp(c(1, NA, 2, 4))
cummax_cpp(c(1, NA, 2, 4), na_rm = TRUE)

mark(
  cummax(x),
  cummax_cpp(x)
)