Programming tweaks

This commit is contained in:
Hadley Wickham 2022-09-08 11:32:10 -05:00
parent d20eb8d22c
commit 5c5774f86d
2 changed files with 38 additions and 35 deletions

View File

@ -30,7 +30,9 @@ Here we present the style we use in our code, but the most important thing is to
### Prerequisites
The focus of this chapter is on writing functions in base R, so you won't need any extra packages.
```{r}
library(tidyverse)
```
## When should you write a function?
@ -46,19 +48,21 @@ df <- tibble::tibble(
d = rnorm(10)
)
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
df |> mutate(
a = (a - min(a, na.rm = TRUE)) /
(max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
b = (b - min(b, na.rm = TRUE)) /
(max(b, na.rm = TRUE) - min(a, na.rm = TRUE)),
c = (c - min(c, na.rm = TRUE)) /
(max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
d = (d - min(d, na.rm = TRUE)) /
(max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
)
```
You might be able to puzzle out that this rescales each column to have a range from 0 to 1.
But did you spot the mistake?
Hadley made an error when copying-and-pasting the code for `df$b`: he forgot to change an `a` to a `b`.
Hadley made an error when copying-and-pasting the code for `b`: he forgot to change an `a` to a `b`.
Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.
To write a function you need to first analyse the code.
@ -127,15 +131,22 @@ Unfortunately, it's beyond the scope of this book, but you can learn about it in
We can simplify the original example now that we have a function:
```{r}
df$a <- rescale01(df$a)
df$b <- rescale01(df$b)
df$c <- rescale01(df$c)
df$d <- rescale01(df$d)
df |> mutate(
a = rescale01(a),
b = rescale01(b),
c = rescale01(c),
d = rescale01(d)
)
```
Compared to the original, this code is easier to understand and we've eliminated one class of copy-and-paste errors.
There is still quite a bit of duplication since we're doing the same thing to multiple columns.
We'll learn how to eliminate that duplication with iteration in [Chapter -@sec-iteration], once you've learned more about R's data structures in [Chapter -@sec-vectors].
We could reduce that duplication with `across()` which you'll learn more about in @sec-iteration:
```{r}
df |>
mutate(across(a:d, rescale01))
```
Another advantage of functions is that if our requirements change, we only need to make the change in one place.
For example, we might discover that some of our variables include infinite values, and `rescale01()` fails:
@ -653,7 +664,7 @@ It's good practice to check important preconditions, and throw an error (with `s
```{r}
wt_mean <- function(x, w) {
if (length(x) != length(w)) {
stop("`x` and `w` must be the same length", call. = FALSE)
stop("`x` and `w` must be the same length")
}
sum(w * x) / sum(w)
}
@ -672,7 +683,7 @@ wt_mean <- function(x, w, na.rm = FALSE) {
stop("`na.rm` must be length 1")
}
if (length(x) != length(w)) {
stop("`x` and `w` must be the same length", call. = FALSE)
stop("`x` and `w` must be the same length")
}
if (na.rm) {

View File

@ -178,7 +178,7 @@ There are four variations on the basic theme of the `for` loop:
### Modifying an existing object
Sometimes you want to use a `for` loop to modify an existing object.
For example, remember our challenge from [Chapter -@sec-functions] on functions.
For example, remember our challenge from @sec-functions on functions.
We wanted to rescale every column in a data frame:
```{r}
@ -580,15 +580,7 @@ The following toy example splits up the `mtcars` dataset into three pieces (one
```{r}
models <- mtcars |>
split(mtcars$cyl) |>
map(function(df) lm(mpg ~ wt, data = df))
```
The syntax for creating an anonymous function in R is quite verbose so purrr provides a convenient shortcut: a one-sided formula.
```{r}
models <- mtcars |>
split(mtcars$cyl) |>
map(~lm(mpg ~ wt, data = .x))
map(\(df) lm(mpg ~ wt, data = df))
```
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the `for` loop).
@ -601,7 +593,7 @@ We could do that using the shorthand for anonymous functions:
```{r}
models |>
map(summary) |>
map_dbl(~ .x$r.squared)
map_dbl(\(x) x$r.squared)
```
But extracting named components is a common operation, so purrr provides an even shorter shortcut: you can use a string.
@ -760,7 +752,7 @@ One way to do that would be to iterate over the indices and index into vectors o
```{r}
sigma <- list(1, 5, 10)
seq_along(mu) |>
map(~rnorm(5, mu[[.x]], sigma[[.x]])) |>
map(\(i) rnorm(5, mu[[i]], sigma[[i]])) |>
str()
```
@ -870,7 +862,7 @@ library(tidyverse)
plots <- mtcars |>
split(mtcars$cyl) |>
map(~ggplot(.x, aes(mpg, wt)) + geom_point())
map(\(df) ggplot(df, aes(mpg, wt)) + geom_point())
paths <- str_c(names(plots), ".pdf")
walk2(paths, plots, ggsave, path = tempdir())
@ -881,7 +873,7 @@ This makes them suitable for use in the middle of pipelines.
## Other patterns of for loops
Purrr provides a number of other functions that abstract over other types of `for` loops.
purrr provides a number of other functions that abstract over other types of `for` loops.
You'll use them less frequently than the map functions, but they're useful to know about.
The goal here is to briefly illustrate each function, so hopefully it will come to mind if you see a similar problem in the future.
Then you can go look up the documentation for more details.
@ -921,20 +913,20 @@ x <- sample(10)
x
x |>
detect(~ .x > 5)
detect(\(x) x > 5)
x |>
detect_index(~ .x > 5)
detect_index(\(x) x > 5)
```
`head_while()` and `tail_while()` take elements from the start or end of a vector while a predicate is true:
```{r}
x |>
head_while(~ .x > 5)
head_while(\(x) x > 5)
x |>
tail_while(~ .x > 5)
tail_while(\(x) x > 5)
```
### Reduce and accumulate