Programming tweaks
This commit is contained in:
parent
d20eb8d22c
commit
5c5774f86d
|
@ -30,7 +30,9 @@ Here we present the style we use in our code, but the most important thing is to
|
|||
|
||||
### Prerequisites
|
||||
|
||||
The focus of this chapter is on writing functions in base R, so you won't need any extra packages.
|
||||
```{r}
|
||||
library(tidyverse)
|
||||
```
|
||||
|
||||
## When should you write a function?
|
||||
|
||||
|
@ -46,19 +48,21 @@ df <- tibble::tibble(
|
|||
d = rnorm(10)
|
||||
)
|
||||
|
||||
df$a <- (df$a - min(df$a, na.rm = TRUE)) /
|
||||
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
|
||||
df$b <- (df$b - min(df$b, na.rm = TRUE)) /
|
||||
(max(df$b, na.rm = TRUE) - min(df$a, na.rm = TRUE))
|
||||
df$c <- (df$c - min(df$c, na.rm = TRUE)) /
|
||||
(max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
|
||||
df$d <- (df$d - min(df$d, na.rm = TRUE)) /
|
||||
(max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))
|
||||
df |> mutate(
|
||||
a = (a - min(a, na.rm = TRUE)) /
|
||||
(max(a, na.rm = TRUE) - min(a, na.rm = TRUE)),
|
||||
b = (b - min(b, na.rm = TRUE)) /
|
||||
(max(b, na.rm = TRUE) - min(a, na.rm = TRUE)),
|
||||
c = (c - min(c, na.rm = TRUE)) /
|
||||
(max(c, na.rm = TRUE) - min(c, na.rm = TRUE)),
|
||||
d = (d - min(d, na.rm = TRUE)) /
|
||||
(max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
|
||||
)
|
||||
```
|
||||
|
||||
You might be able to puzzle out that this rescales each column to have a range from 0 to 1.
|
||||
But did you spot the mistake?
|
||||
Hadley made an error when copying-and-pasting the code for `df$b`: he forgot to change an `a` to a `b`.
|
||||
Hadley made an error when copying-and-pasting the code for `b`: he forgot to change an `a` to a `b`.
|
||||
Extracting repeated code out into a function is a good idea because it prevents you from making this type of mistake.
|
||||
|
||||
To write a function you need to first analyse the code.
|
||||
|
@ -127,15 +131,22 @@ Unfortunately, it's beyond the scope of this book, but you can learn about it in
|
|||
We can simplify the original example now that we have a function:
|
||||
|
||||
```{r}
|
||||
df$a <- rescale01(df$a)
|
||||
df$b <- rescale01(df$b)
|
||||
df$c <- rescale01(df$c)
|
||||
df$d <- rescale01(df$d)
|
||||
df |> mutate(
|
||||
a = rescale01(a),
|
||||
b = rescale01(b),
|
||||
c = rescale01(c),
|
||||
d = rescale01(d)
|
||||
)
|
||||
```
|
||||
|
||||
Compared to the original, this code is easier to understand and we've eliminated one class of copy-and-paste errors.
|
||||
There is still quite a bit of duplication since we're doing the same thing to multiple columns.
|
||||
We'll learn how to eliminate that duplication with iteration in [Chapter -@sec-iteration], once you've learned more about R's data structures in [Chapter -@sec-vectors].
|
||||
We could reduce that duplication with `across()` which you'll learn more about in @sec-iteration:
|
||||
|
||||
```{r}
|
||||
df |>
|
||||
mutate(across(a:d, rescale01))
|
||||
```
|
||||
|
||||
Another advantage of functions is that if our requirements change, we only need to make the change in one place.
|
||||
For example, we might discover that some of our variables include infinite values, and `rescale01()` fails:
|
||||
|
@ -653,7 +664,7 @@ It's good practice to check important preconditions, and throw an error (with `s
|
|||
```{r}
|
||||
wt_mean <- function(x, w) {
|
||||
if (length(x) != length(w)) {
|
||||
stop("`x` and `w` must be the same length", call. = FALSE)
|
||||
stop("`x` and `w` must be the same length")
|
||||
}
|
||||
sum(w * x) / sum(w)
|
||||
}
|
||||
|
@ -672,7 +683,7 @@ wt_mean <- function(x, w, na.rm = FALSE) {
|
|||
stop("`na.rm` must be length 1")
|
||||
}
|
||||
if (length(x) != length(w)) {
|
||||
stop("`x` and `w` must be the same length", call. = FALSE)
|
||||
stop("`x` and `w` must be the same length")
|
||||
}
|
||||
|
||||
if (na.rm) {
|
||||
|
|
|
@ -178,7 +178,7 @@ There are four variations on the basic theme of the `for` loop:
|
|||
### Modifying an existing object
|
||||
|
||||
Sometimes you want to use a `for` loop to modify an existing object.
|
||||
For example, remember our challenge from [Chapter -@sec-functions] on functions.
|
||||
For example, remember our challenge from @sec-functions on functions.
|
||||
We wanted to rescale every column in a data frame:
|
||||
|
||||
```{r}
|
||||
|
@ -580,15 +580,7 @@ The following toy example splits up the `mtcars` dataset into three pieces (one
|
|||
```{r}
|
||||
models <- mtcars |>
|
||||
split(mtcars$cyl) |>
|
||||
map(function(df) lm(mpg ~ wt, data = df))
|
||||
```
|
||||
|
||||
The syntax for creating an anonymous function in R is quite verbose so purrr provides a convenient shortcut: a one-sided formula.
|
||||
|
||||
```{r}
|
||||
models <- mtcars |>
|
||||
split(mtcars$cyl) |>
|
||||
map(~lm(mpg ~ wt, data = .x))
|
||||
map(\(df) lm(mpg ~ wt, data = df))
|
||||
```
|
||||
|
||||
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the `for` loop).
|
||||
|
@ -601,7 +593,7 @@ We could do that using the shorthand for anonymous functions:
|
|||
```{r}
|
||||
models |>
|
||||
map(summary) |>
|
||||
map_dbl(~ .x$r.squared)
|
||||
map_dbl(\(x) x$r.squared)
|
||||
```
|
||||
|
||||
But extracting named components is a common operation, so purrr provides an even shorter shortcut: you can use a string.
|
||||
|
@ -760,7 +752,7 @@ One way to do that would be to iterate over the indices and index into vectors o
|
|||
```{r}
|
||||
sigma <- list(1, 5, 10)
|
||||
seq_along(mu) |>
|
||||
map(~rnorm(5, mu[[.x]], sigma[[.x]])) |>
|
||||
map(\(i) rnorm(5, mu[[i]], sigma[[i]])) |>
|
||||
str()
|
||||
```
|
||||
|
||||
|
@ -870,7 +862,7 @@ library(tidyverse)
|
|||
|
||||
plots <- mtcars |>
|
||||
split(mtcars$cyl) |>
|
||||
map(~ggplot(.x, aes(mpg, wt)) + geom_point())
|
||||
map(\(df) ggplot(df, aes(mpg, wt)) + geom_point())
|
||||
paths <- str_c(names(plots), ".pdf")
|
||||
|
||||
walk2(paths, plots, ggsave, path = tempdir())
|
||||
|
@ -881,7 +873,7 @@ This makes them suitable for use in the middle of pipelines.
|
|||
|
||||
## Other patterns of for loops
|
||||
|
||||
Purrr provides a number of other functions that abstract over other types of `for` loops.
|
||||
purrr provides a number of other functions that abstract over other types of `for` loops.
|
||||
You'll use them less frequently than the map functions, but they're useful to know about.
|
||||
The goal here is to briefly illustrate each function, so hopefully it will come to mind if you see a similar problem in the future.
|
||||
Then you can go look up the documentation for more details.
|
||||
|
@ -921,20 +913,20 @@ x <- sample(10)
|
|||
x
|
||||
|
||||
x |>
|
||||
detect(~ .x > 5)
|
||||
detect(\(x) x > 5)
|
||||
|
||||
x |>
|
||||
detect_index(~ .x > 5)
|
||||
detect_index(\(x) x > 5)
|
||||
```
|
||||
|
||||
`head_while()` and `tail_while()` take elements from the start or end of a vector while a predicate is true:
|
||||
|
||||
```{r}
|
||||
x |>
|
||||
head_while(~ .x > 5)
|
||||
head_while(\(x) x > 5)
|
||||
|
||||
x |>
|
||||
tail_while(~ .x > 5)
|
||||
tail_while(\(x) x > 5)
|
||||
```
|
||||
|
||||
### Reduce and accumulate
|
||||
|
|
Loading…
Reference in New Issue