--- output: rmarkdown::html_vignette title: Loop-invariant Code Motion vignette: > %\VignetteIndexEntry{Loop-invariant Code Motion} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r echo=FALSE, message=FALSE} library("rco") library("microbenchmark") library("ggplot2") autoplot.microbenchmark <- function(obj) { levels(obj$expr) <- paste0("Expr_", seq_along(levels(obj$expr))) microbenchmark:::autoplot.microbenchmark(obj) } speed_up <- function(obj) { levels(obj$expr) <- paste0("Expr_", seq_along(levels(obj$expr))) obj <- as.data.frame(obj) summaries <- do.call(rbind, by(obj$time, obj$expr, summary)) res <- c() for (i in seq_len(nrow(summaries) - 1) + 1) { res <- rbind(res, summaries[1, ] / summaries[i, ]) } rownames(res) <- levels(obj$expr)[-1] return(res) } ``` # Loop-invariant Code Motion ## Idea Invariant code is a piece of code that could be run outside of a loop without affecting the actual outcome of a program. This may come in the form of variables that are not modified after going through the loop and as such are called "invariants" or non-changing. If these pieces of code were removed from the loops, this would speed up the script being run. ## Background Loop-invariant code consists of statements or expressions which can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion is a compiler optimization which performs this movement automatically. For example, consider the following code: ```{r eval=FALSE} i <- 0 n <- 1000 y <- rnorm(1) z <- rnorm(1) a <- c() while (i < n) { x <- y + z a[i] <- 6 * i + x * x i <- i + 1 } ``` Here, `x` is assigned a value `y + z` that is constant in each loop execution. In this example, the assignment would be executed `n` times. The same result, but much faster, would be obtained by doing the assign outside the loop. ## Example Consider the previous example to be automatically optimized. ```{r} code <- paste( "i <- 0", "n <- 1000", "y <- rnorm(1)", "z <- rnorm(1)", "a <- c()", "while (i < n) {", " x <- y + z", " a[i] <- 6 * i + x * x", " i <- i + 1", "}", sep = "\n" ) cat(code) ``` Then, the automatically optimized code would be: ```{r} opt_code <- opt_loop_invariant(list(code)) cat(opt_code$codes[[1]]) ``` And if we measure the execution time of each one, and the speed-up: ```{r message=FALSE} bmark_res <- microbenchmark({ eval(parse(text = code)) }, { eval(parse(text = opt_code)) }) autoplot(bmark_res) speed_up(bmark_res) ``` ## Implementation The `opt_loop_invariant` optimizer does: * Finds `for` and `while` loops in the code. * Discards loops that have function calls inside, as function calls can edit the environment. * Gets which variables are variant inside the loop. * Detects which expressions do not use any variant variable. * Gets these invariant variables, and moves them outside the loop, but inside an `if` statement (with the same conditional as the loop). ## To-Do * `if/else` code motion? Actually, the `opt_loop_invariant` does not consider `if/else` expressions to move. In this sense, the code: ```{r eval=FALSE} while (i < n) { if (n > 3) { something_without_i } } ``` Will not be optimized to its equivalent code: ```{r eval=FALSE} if (i < n) { if (n > 3) { something_without_i } } while (i < n) { } ``` * Invariant subexpressions motion? Actually, the `opt_loop_invariant` considers only full expressions, it is not working on subexpressions, for instance, the code: ```{r eval=FALSE} while (i < n) { i <- (x * y) + 1 } ``` as `x` and `y` are invariant, could be optimized to: ```{r eval=FALSE} is_1 <- (x * y) while (i < n) { i <- is_1 + 1 } ``` * Include `repeat` optimization? Since determining the conditional that causes a `repeat` loop to stop is not that simple, it is not easy to remove invariant variables and place them in an `if`. For example, the code: ```{r eval=FALSE} y <- 1 repeat{ if (y > 4) { break } x <- 8 * 8 y <- y + 1 } ``` Must be optimized to: ```{r eval=FALSE} y <- 1 if (y <= 4) { x <- 8 * 8 } repeat{ if (y > 4) { break } y <- y + 1 } ```