More polishing + some exercises
This commit is contained in:
parent
8078a9c0f7
commit
f004297d8c
|
@ -20,17 +20,15 @@ Writing a function has three big advantages over using copy-and-paste:
|
|||
3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
|
||||
|
||||
A good rule of thumb is to consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).
|
||||
The goal of this chapter is to get you started on your journey with three useful types of functions:
|
||||
In this chapter, you'll learn about three useful types of functions:
|
||||
|
||||
- Vector functions take one or more vectors as input and return a vector as output.
|
||||
- Data frame functions take a data frame as input and return a data frame as output.
|
||||
- Plot functions that take a data frame as input and return a plot as output.
|
||||
|
||||
The chapter concludes with some advice on function style.
|
||||
|
||||
This chapter includes many examples to help you generalize the patterns that you see.
|
||||
Many of the examples were inspired by real data analysis code supplied by folks on twitter; follow the links in the comment to see original inspiration.
|
||||
And if you want to see even more examples, check out the motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680).
|
||||
Each of these sections include many examples to help you generalize the patterns that you see.
|
||||
These examples wouldn't be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations.
|
||||
You might also want to read the original motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680) to see even more functions.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -549,7 +547,7 @@ flights_sub(dest == "IAH", contains("time"))
|
|||
### Data-masking vs tidy-selection
|
||||
|
||||
Sometimes you want to select variables inside a function that uses data-masking.
|
||||
For example, imagine you want to write `count_missing()` that counts the number of missing observations in rows.
|
||||
For example, imagine you want to write a `count_missing()` that counts the number of missing observations in rows.
|
||||
You might try writing something like:
|
||||
|
||||
```{r}
|
||||
|
@ -577,7 +575,7 @@ flights |>
|
|||
```
|
||||
|
||||
Another convenient use of `pick()` is to make a 2d table of counts.
|
||||
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange into a grid:
|
||||
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange the counts into a grid:
|
||||
|
||||
```{r}
|
||||
# https://twitter.com/pollicipes/status/1571606508944719876
|
||||
|
@ -595,10 +593,58 @@ diamonds |> count_wide(clarity, cut)
|
|||
diamonds |> count_wide(c(clarity, color), cut)
|
||||
```
|
||||
|
||||
While our examples have mostly focused on dplyr, the tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
|
||||
While our examples have mostly focused on dplyr, tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
|
||||
|
||||
### Exercises
|
||||
|
||||
1. Using the datasets from nyclights13, write functions that:
|
||||
|
||||
1. Find all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
flights |> filter_severe()
|
||||
```
|
||||
|
||||
2. Counts the number of cancelled flights and the number of flights delayed by more than an hour.
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
flights |> group_by(dest) |> summarise_severe()
|
||||
```
|
||||
|
||||
3. Finds all flights that were cancelled or delayed by more than a user supplied number of hours:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
flights |> filter_severe(hours = 2)
|
||||
```
|
||||
|
||||
4. Summarizes the weather to compute the minum, mean, and maximum, of a user supplied variable:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
weather |> summarise_weather(temp)
|
||||
```
|
||||
|
||||
5. Converts the user supplied variable that uses clock time (e.g. `dep_time`, `arr_time`, etc) into a decimal time (i.e. hours + minutes / 60).
|
||||
|
||||
```{r}
|
||||
weather |> standardise_time(sched_dep_time)
|
||||
```
|
||||
|
||||
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
|
||||
|
||||
3. Generalize the following function so that you can supply any number of variables to count.
|
||||
|
||||
```{r}
|
||||
count_prop <- function(df, var, sort = FALSE) {
|
||||
df |>
|
||||
count({{ var }}, sort = sort) |>
|
||||
mutate(prop = n / sum(n))
|
||||
}
|
||||
```
|
||||
|
||||
## Plot functions
|
||||
|
||||
Instead of returning a data frame, you might want to return a plot.
|
||||
|
@ -812,6 +858,13 @@ You can use the same approach any other place that you might supply a string in
|
|||
|
||||
### Exercises
|
||||
|
||||
1. Build up a rich plotting function by incrementally implementing each of the steps below.
|
||||
1. Draw a scatterplot given dataset and `x` and `y` variables.
|
||||
|
||||
2. Add a line of best fit (i.e. a linear model with no standard errors).
|
||||
|
||||
3. Add a title.
|
||||
|
||||
## Style
|
||||
|
||||
R doesn't care what your function or arguments are called but the names make a big difference for humans.
|
||||
|
@ -890,9 +943,9 @@ Along the way your saw many examples, which hopefully started to get your creati
|
|||
We have only shown you the bare minimum to get started with functions and there's much more to learn.
|
||||
A few places to learn more are:
|
||||
|
||||
- To learn more about programming with tidy evaluation, see useful recipes in `vignette("programming", package = "dplyr")` and `vignette("programming", package = "tidyr")` and learn more about the theory in <https://rlang.r-lib.org/reference/topic-data-mask.html>.
|
||||
- To learn more about programming with tidy evaluation, see useful recipes in [programming with dplyr](https://dplyr.tidyverse.org/articles/programming.html) and [programming with tidyr](https://tidyr.tidyverse.org/articles/programming.html) and learn more about the theory in [What is data-masking and why do I need {{?](https://rlang.r-lib.org/reference/topic-data-mask.html).
|
||||
- To learn more about reducing duplication in your ggplot2 code, read the [Programming with ggplot2](https://ggplot2-book.org/programming.html){.uri} chapter of the ggplot2 book.
|
||||
- To learn more about good function style, read <https://style.tidyverse.org/functions.html>.
|
||||
- For more advice on function style, see the [tidyverse style guide](https://style.tidyverse.org/functions.html){.uri}.
|
||||
|
||||
In the next chapter, we'll dive into some of the details of R's vector data structures that we've omitted so far.
|
||||
These are not immediately useful by themselves, but are a necessary foundation for the following chapter on iteration which gives you further tools for reducing code duplication.
|
||||
|
|
Loading…
Reference in New Issue