pRactice corner: Efficient coding

lruolin

Background

Benchmarking and profiling are key to efficient programming. This post is on how to do both benchmarking and profiling. All codes run, but how do you compare the different methods such that you choose the most efficient one?

I also discovered that you can change the outlook of R (perhaps by giving you a random quote each time it starts up). Feels like opening a fortune cookie to eat..

Load packages

library(microbenchmark)
library(profvis)

Benchmarking in R

Benchmarking is the process of testing the performance of certain functions/operations repeatedly, and allows you to see the amount of time taken. A good way to compare is to look at the median time taken.

Example 1

df = data.frame(v = 1:4, name = letters[1:4])

microbenchmark::microbenchmark(df[3,2],
                               df[3, "name"],
                               df$name[3])

Unit: microseconds
          expr    min      lq     mean  median      uq     max neval
      df[3, 2] 13.847 14.5375 27.24228 15.0980 21.0305 332.356   100
 df[3, "name"] 13.701 14.5110 21.01448 15.2035 19.0165 251.460   100
    df$name[3]  1.285  1.5595  2.24709  1.7410  2.3860   8.823   100

Example 2

Creating your own functions

x <- 1:100 # initiate vector to cumulatively sum

# method 1: with a for loop 

cum_sum_for_loop <- function(x) {
  
  for (i in x) {
    if (i == 1) {
      xc = x[i]
    }
    
    else {
      xc = c(xc, sum(x[1:i]))
    }
  }
  
  xc
}

# method 2: apply

cum_sum_apply <- function(x) {
  sapply(x, function(x) sum(1:x))
}


# method 3: cumsum function

# compare in nanoseconds
microbenchmark::microbenchmark(cum_sum_for_loop(x),
                               cum_sum_apply(x),
                               cumsum(x))

Unit: nanoseconds
                expr    min       lq      mean median       uq
 cum_sum_for_loop(x) 218225 254929.5 618847.13 279501 326327.5
    cum_sum_apply(x) 153106 193142.5 285312.73 218091 250483.0
           cumsum(x)    872   1054.5   2296.35   1446   2656.0
      max neval
 31747985   100
  4389066   100
    16054   100

# compare in seconds
microbenchmark::microbenchmark(cum_sum_for_loop(x),
                               cum_sum_apply(x),
                               cumsum(x),
                               unit = "s")

Unit: seconds
                expr         min           lq         mean
 cum_sum_for_loop(x) 0.000166065 0.0001733355 2.419567e-04
    cum_sum_apply(x) 0.000129825 0.0001336490 1.624120e-04
           cumsum(x) 0.000000885 0.0000010935 1.636710e-06
       median           uq         max neval
 0.0002050395 0.0002480690 0.000997432   100
 0.0001419865 0.0001608905 0.000449203   100
 0.0000012715 0.0000015650 0.000007260   100

The third method is the fastest.

Profilling in R

Profiling involves running many lines of code to find bottlenecks. This helps in understanding why codes take so long to run.

# Only run these examples in interactive R sessions
if (interactive()) {

# Profile some code
profvis({
  dat <- data.frame(
    x = rnorm(5e4),
    y = rnorm(5e4)
  )

  plot(x ~ y, data = dat)
  m <- lm(x ~ y, data = dat)
  abline(m, col = "red")
})
  
# Save a profile to an HTML file
p <- profvis({
  dat <- data.frame(
    x = rnorm(5e4),
    y = rnorm(5e4)
  )

  plot(x ~ y, data = dat)
  m <- lm(x ~ y, data = dat)
  abline(m, col = "red")
})
htmlwidgets::saveWidget(p, "profile.html")

# Can open in browser from R
browseURL("profile.html")

}

Modifying startup in R

To access the Rprofile:

usethis::edit_r_profile()

To see a quote from the fortunes package each time R starts up:

paste(as.character(fortunes::fortune()), collapse = " ")

References

Efficient R Programming

https://damien-datasci-blog.netlify.app/post/2020-12-31-pimp-your-r-startup-message/

Comment on this article Share:

Efficient coding