---
output: rmarkdown::html_vignette
title: Potential Optimizers
vignette: >
  %\VignetteIndexEntry{Potential Optimizers}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Potential optimizers

```{r setup, include=FALSE}
library("ggplot2")
library("microbenchmark")

knitr::opts_chunk$set(echo = TRUE)
```

## Inline Expansion

### Idea

Replacing a function call with the body of the called function is called "inline expansion". This eliminates the function calling overhead and also the overhead of return call from a function. It also saves the overhead of variables push/pop on the stack while function calling.

### Code Examples

#### Unoptimized Code

```{r inline_expansion_code_og, echo = TRUE, warning = TRUE}
cubed <- function(x) {
  x * x * x
}

inline <- function(n) {
  to_cubes <- 0
  for (i in seq_len(n)) {
    to_cubes <- to_cubes + cubed(i)
  }
}
```

#### Proposed Optimized Code

```{r inline_expansion_code_op, echo = TRUE, warning = TRUE}
inline_opt <- function(n) {
  to_cubes <- 0
  for (i in seq_len(n)) {
    to_cubes <- to_cubes + (i * i * i) # function inlined
  }
}
```

### Benchmark

```{r inline_expansion_benchmark, echo = TRUE, warning = FALSE, message= FALSE}
n <- 1000
autoplot(microbenchmark(inline(n), inline_opt(n)))
```

## Memory Pre-Allocation

### Idea

As a general rule of thumb, in any programming language, we should undertake memory management as much as possible. When we grow a vector inside a loop, the vector asks the processor for extra space in between the running program and then proceeds, once it gets the required memory. This process is repeated for every iteration of the loop. Thus we should pre-allocate the required memory to a vector to avoid such delays.

### Code Examples

#### Unoptimized Code

```{r pre_allocation_code_og}
mem_alloc <- function(n) {
  vec <- NULL
  for (i in seq_len(n)) {
    vec[i] <- i
  }
}
```

#### Proposed Optimized Code

```{r pre_allocation_code_op}
mem_alloc_opt <- function(n) {
  vec <- vector(length = n)
  for (i in seq_len(n)) {
    vec[i] <- i
  }
}
```

### Benchmark

```{r pre_allocate_benchmark, echo = TRUE, warning = FALSE, message= FALSE}
n <- 100000
autoplot(microbenchmark(mem_alloc(n), mem_alloc_opt(n)))
```

## Vectorization

### Idea

A golden rule in R programming is to access the underlying C/Fortran routines as much as possible; the fewer R function calls required to achieve this, the better. Many R functions are therefore vectorized, that is, the function's inputs and/or outputs naturally work with vectors, reducing the number of function calls required.

### Code Examples

#### Unoptimized Code

```{r vectorization_code_og}
non_vectorized <- function(n) {
  v1 <- seq_len(n)
  v2 <- length(seq.int(n + 2, n + 1000, 2))
  res <- vector(length = length(v1))
  for (i in seq_len(n)) {
    res[i] <- v1[i] + v2[i]
  }
}
```

#### Proposed Optimized Code

```{r vectorization_code_op}
vectorized <- function(n) {
  v1 <- seq_len(n)
  v2 <- length(seq.int(n + 2, n + 1000, 2))
  res <- v1 + v2
}
```

### Benchmark

```{r vectorized_benchmark, echo = TRUE, warning = FALSE, message= FALSE}
n <- 10000
autoplot(microbenchmark(non_vectorized(n), vectorized(n)))
```
  
## Efficient Column Extraction

### Idea

The idea would be to replace the different one-column extraction alternatives by the much faster `.subset2` call alternative.

### Benchmark

```{r col_ext_benchmark, warning = FALSE, message = FALSE}
autoplot(microbenchmark(
  mtcars[, 11],
  mtcars$carb,
  mtcars[[c(11)]],
  mtcars[[11]],
  .subset2(mtcars, 11)
))
```

### Drawbacks

1. For some R classes, the `[[ ]]` operator and `.subset` work differently. For instance, they seem to be equivalent for `data.frame` but are not the same for `matrix` class.

2. Moreover, both `[[ ]]` and `.subset2`  are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the the `.subset2` function.

## Efficient Value Extraction

### Idea

The idea would be to replace the different one-value extraction alternatives by the much faster `.subset2` call alternative.

### Benchmark

```{r val_ext_benchmark, warning = FALSE, message = FALSE}
autoplot(microbenchmark(
  mtcars[32, 11],
  mtcars$carb[32],
  mtcars[[c(11, 32)]],
  mtcars[[11]][32],
  .subset2(mtcars, 11)[32],
  times = 1000L
))
```

### Drawback

1. For some R classes, the `[[ ]]` operator and `.subset` work differently. For instance, they seem to be equivalent for `data.frame` but are not the same for `matrix` class.

2. Moreover, both `[[ ]]` and `.subset2`  are functions and in R, any function can be overwritten. Thus the above optimization can be made to fail just by redefining, say, the `.subset2` function.

<!-- ## External packages that can be explored for further speed-ups -->

<!-- ### For efficient programming -->
<!-- * **compiler** -->

<!-- * **memoise** -->

<!-- ### For Efficient Data Carpentry -->
<!-- * **tibble** -->

<!-- * **tidyr** -->

<!-- * **stringr** -->

<!-- * **readr** -->

<!-- * **dplyr** -->

<!-- * **data.table**   -->

<!-- ### For efficient input/output -->
<!-- * **rio** -->

<!-- * **readr** -->

<!-- * **data.table** -->

<!-- * **feather** -->

<!-- ### For efficient Optimization -->
<!-- * **profvis** -->

<!-- * **Rcpp** -->

<!-- ### For efficient set-up and hardware -->
<!-- * **benchmarkme** -->

<!-- ### For efficient collaboration -->
<!-- * **lubridate** -->

<!-- ### For efficient workflow -->
<!-- * **DiagrammeR** -->

<!-- ### Some additional comments -->

<!-- * `if` is faster than `ifelse`, but the speed boost of `if` isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, `if` is only advantageous over `ifelse` when the *cond/test argument is of length 1*. -->

<!-- * A factor is just a vector of integers with associated levels. Occasionally we want to convert a factor into its numerical equivalent. The most efficient way of doing this (especially for long factors) is: -->

<!--   `as.numeric(levels(f))[f]` -->

<!-- * The non-vectorised version of R logical vectors, `&&` and `||`, only executes the second component if needed. This is efficient and leads to neater. Care must be taken not to use `&&` or `||` on vectors since it only evaluates the first element of the vector, giving the incorrect answer. -->