Efficiency is getting more output for less work input
Benchmarking and profiling are key to efficient programming. This post is on how to do both benchmarking and profiling. All codes run, but how do you compare the different methods such that you choose the most efficient one?
I also discovered that you can change the outlook of R (perhaps by giving you a random quote each time it starts up). Feels like opening a fortune cookie to eat..
Benchmarking is the process of testing the performance of certain functions/operations repeatedly, and allows you to see the amount of time taken. A good way to compare is to look at the median time taken.
df = data.frame(v = 1:4, name = letters[1:4])
microbenchmark::microbenchmark(df[3,2],
df[3, "name"],
df$name[3])
Unit: microseconds
expr min lq mean median uq max neval
df[3, 2] 13.847 14.5375 27.24228 15.0980 21.0305 332.356 100
df[3, "name"] 13.701 14.5110 21.01448 15.2035 19.0165 251.460 100
df$name[3] 1.285 1.5595 2.24709 1.7410 2.3860 8.823 100
Creating your own functions
x <- 1:100 # initiate vector to cumulatively sum
# method 1: with a for loop
cum_sum_for_loop <- function(x) {
for (i in x) {
if (i == 1) {
xc = x[i]
}
else {
xc = c(xc, sum(x[1:i]))
}
}
xc
}
# method 2: apply
cum_sum_apply <- function(x) {
sapply(x, function(x) sum(1:x))
}
# method 3: cumsum function
# compare in nanoseconds
microbenchmark::microbenchmark(cum_sum_for_loop(x),
cum_sum_apply(x),
cumsum(x))
Unit: nanoseconds
expr min lq mean median uq
cum_sum_for_loop(x) 218225 254929.5 618847.13 279501 326327.5
cum_sum_apply(x) 153106 193142.5 285312.73 218091 250483.0
cumsum(x) 872 1054.5 2296.35 1446 2656.0
max neval
31747985 100
4389066 100
16054 100
# compare in seconds
microbenchmark::microbenchmark(cum_sum_for_loop(x),
cum_sum_apply(x),
cumsum(x),
unit = "s")
Unit: seconds
expr min lq mean
cum_sum_for_loop(x) 0.000166065 0.0001733355 2.419567e-04
cum_sum_apply(x) 0.000129825 0.0001336490 1.624120e-04
cumsum(x) 0.000000885 0.0000010935 1.636710e-06
median uq max neval
0.0002050395 0.0002480690 0.000997432 100
0.0001419865 0.0001608905 0.000449203 100
0.0000012715 0.0000015650 0.000007260 100
The third method is the fastest.
Profiling involves running many lines of code to find bottlenecks. This helps in understanding why codes take so long to run.
# Only run these examples in interactive R sessions
if (interactive()) {
# Profile some code
profvis({
dat <- data.frame(
x = rnorm(5e4),
y = rnorm(5e4)
)
plot(x ~ y, data = dat)
m <- lm(x ~ y, data = dat)
abline(m, col = "red")
})
# Save a profile to an HTML file
p <- profvis({
dat <- data.frame(
x = rnorm(5e4),
y = rnorm(5e4)
)
plot(x ~ y, data = dat)
m <- lm(x ~ y, data = dat)
abline(m, col = "red")
})
htmlwidgets::saveWidget(p, "profile.html")
# Can open in browser from R
browseURL("profile.html")
}
To access the Rprofile:
usethis::edit_r_profile()
To see a quote from the fortunes package each time R starts up:
paste(as.character(fortunes::fortune()), collapse = " ")
Efficient R Programming
https://damien-datasci-blog.netlify.app/post/2020-12-31-pimp-your-r-startup-message/
For attribution, please cite this work as
lruolin (2022, April 1). pRactice corner: Efficient coding. Retrieved from https://lruolin.github.io/myBlog/posts/20220401 - Efficient coding/
BibTeX citation
@misc{lruolin2022efficient, author = {lruolin, }, title = {pRactice corner: Efficient coding}, url = {https://lruolin.github.io/myBlog/posts/20220401 - Efficient coding/}, year = {2022} }