--- output: rmarkdown::html_vignette title: Potential Optimizers vignette: > %\VignetteIndexEntry{Potential Optimizers} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Potential optimizers ```{r setup, include=FALSE} library("ggplot2") library("microbenchmark") knitr::opts_chunk$set(echo = TRUE) ``` ## Inline Expansion ### Idea Replacing a function call with the body of the called function is called "inline expansion". This eliminates the function calling overhead and also the overhead of return call from a function. It also saves the overhead of variables push/pop on the stack while function calling. ### Code Examples #### Unoptimized Code ```{r inline_expansion_code_og, echo = TRUE, warning = TRUE} cubed <- function(x) { x * x * x } inline <- function(n) { to_cubes <- 0 for (i in seq_len(n)) { to_cubes <- to_cubes + cubed(i) } } ``` #### Proposed Optimized Code ```{r inline_expansion_code_op, echo = TRUE, warning = TRUE} inline_opt <- function(n) { to_cubes <- 0 for (i in seq_len(n)) { to_cubes <- to_cubes + (i * i * i) # function inlined } } ``` ### Benchmark ```{r inline_expansion_benchmark, echo = TRUE, warning = FALSE, message= FALSE} n <- 1000 autoplot(microbenchmark(inline(n), inline_opt(n))) ``` ## Memory Pre-Allocation ### Idea As a general rule of thumb, in any programming language, we should undertake memory management as much as possible. When we grow a vector inside a loop, the vector asks the processor for extra space in between the running program and then proceeds, once it gets the required memory. This process is repeated for every iteration of the loop. Thus we should pre-allocate the required memory to a vector to avoid such delays. ### Code Examples #### Unoptimized Code ```{r pre_allocation_code_og} mem_alloc <- function(n) { vec <- NULL for (i in seq_len(n)) { vec[i] <- i } } ``` #### Proposed Optimized Code ```{r pre_allocation_code_op} mem_alloc_opt <- function(n) { vec <- vector(length = n) for (i in seq_len(n)) { vec[i] <- i } } ``` ### Benchmark ```{r pre_allocate_benchmark, echo = TRUE, warning = FALSE, message= FALSE} n <- 100000 autoplot(microbenchmark(mem_alloc(n), mem_alloc_opt(n))) ``` ## Vectorization ### Idea A golden rule in R programming is to access the underlying C/Fortran routines as much as possible; the fewer R function calls required to achieve this, the better. Many R functions are therefore vectorized, that is, the function's inputs and/or outputs naturally work with vectors, reducing the number of function calls required. ### Code Examples #### Unoptimized Code ```{r vectorization_code_og} non_vectorized <- function(n) { v1 <- seq_len(n) v2 <- length(seq.int(n + 2, n + 1000, 2)) res <- vector(length = length(v1)) for (i in seq_len(n)) { res[i] <- v1[i] + v2[i] } } ``` #### Proposed Optimized Code ```{r vectorization_code_op} vectorized <- function(n) { v1 <- seq_len(n) v2 <- length(seq.int(n + 2, n + 1000, 2)) res <- v1 + v2 } ``` ### Benchmark ```{r vectorized_benchmark, echo = TRUE, warning = FALSE, message= FALSE} n <- 10000 autoplot(microbenchmark(non_vectorized(n), vectorized(n))) ``` ## Efficient Column Extraction ### Idea The idea would be to replace the different one-column extraction alternatives by the much faster `.subset2` call alternative. ### Benchmark ```{r col_ext_benchmark, warning = FALSE, message = FALSE} autoplot(microbenchmark( mtcars[, 11], mtcars$carb, mtcars[[c(11)]], mtcars[[11]], .subset2(mtcars, 11) )) ``` ### Drawbacks 1. For some R classes, the `[[ ]]` operator and `.subset` work differently. For instance, they seem to be equivalent for `data.frame` but are not the same for `matrix` class. 2. Moreover, both `[[ ]]` and `.subset2` are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the the `.subset2` function. ## Efficient Value Extraction ### Idea The idea would be to replace the different one-value extraction alternatives by the much faster `.subset2` call alternative. ### Benchmark ```{r val_ext_benchmark, warning = FALSE, message = FALSE} autoplot(microbenchmark( mtcars[32, 11], mtcars$carb[32], mtcars[[c(11, 32)]], mtcars[[11]][32], .subset2(mtcars, 11)[32], times = 1000L )) ``` ### Drawback 1. For some R classes, the `[[ ]]` operator and `.subset` work differently. For instance, they seem to be equivalent for `data.frame` but are not the same for `matrix` class. 2. Moreover, both `[[ ]]` and `.subset2` are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the `.subset2` function.