|
|
|
|
@@ -32,9 +32,9 @@ The goal of this chapter is to get you started on your journey with functions wi
|
|
|
|
|
The chapter concludes with some advice on function style.
|
|
|
|
|
|
|
|
|
|
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
|
|
|
|
|
We've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
|
|
|
|
|
If you want just to see a huge variety of funcitons, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
|
|
|
|
WI won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
|
|
|
|
|
We've often simplified the code from the original so you might want to look at the original tweets which we list in the comments.
|
|
|
|
|
If you want just to see a huge variety of functions, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
|
|
|
|
WI won't fully explain all of the functions that we use here, so you might need to do some reading of the documentation.
|
|
|
|
|
|
|
|
|
|
### Prerequisites
|
|
|
|
|
|
|
|
|
|
@@ -101,14 +101,14 @@ If we take the code above and pull it outside of `mutate()` it's a little easier
|
|
|
|
|
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
To make this a bit clearer I can replace the bit that varies with `█`:
|
|
|
|
|
To make this a bit clearer we can replace the bit that varies with `█`:
|
|
|
|
|
|
|
|
|
|
```{r}
|
|
|
|
|
#| eval: false
|
|
|
|
|
(█ - min(█, na.rm = TRUE)) / (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
There's only one thing that varies which implies I'm going to need a function with one argument.
|
|
|
|
|
There's only one thing that varies which implies we're going to need a function with one argument.
|
|
|
|
|
|
|
|
|
|
To turn this into an actual function you need three things:
|
|
|
|
|
|
|
|
|
|
@@ -473,7 +473,7 @@ summary6 <- function(data, var) {
|
|
|
|
|
diamonds |> summary6(carat)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
(Whenever you wrap `summarise()` in a helper, I think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
|
|
|
|
(Whenever you wrap `summarise()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
|
|
|
|
|
|
|
|
|
The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
|
|
|
|
|
|
|
|
|
|
@@ -563,7 +563,7 @@ We didn't discuss `pivot_wider()` above, but you can read the docs to discover t
|
|
|
|
|
### Selecting rows and columns
|
|
|
|
|
|
|
|
|
|
Or maybe you want to find the sorted unique values of a variable for a subset of the data.
|
|
|
|
|
Rather than supplying a variable and a value to do the filtering, I'll allow the user to supply an condition:
|
|
|
|
|
Rather than supplying a variable and a value to do the filtering, we'll allow the user to supply an condition:
|
|
|
|
|
|
|
|
|
|
```{r}
|
|
|
|
|
unique_where <- function(df, condition, var) {
|
|
|
|
|
@@ -582,7 +582,7 @@ flights |> unique_where(tailnum == "N14228", month)
|
|
|
|
|
|
|
|
|
|
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()`, `arrange()`, and `pull()`.
|
|
|
|
|
|
|
|
|
|
I've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
|
|
|
|
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
|
|
|
|
For example, this function always works with the flights dataset, make it easy to grab the subset that you want to work with.
|
|
|
|
|
It always includes `time_hour`, `carrier`, and `flight` since these are the primary key that allows you to identify a row.
|
|
|
|
|
|
|
|
|
|
@@ -682,7 +682,7 @@ diamonds |> hex_plot(carat, price, depth)
|
|
|
|
|
|
|
|
|
|
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
|
|
|
|
For example, if you might want to do a bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
|
|
|
|
And I'm drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
|
|
|
|
And we're drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
|
|
|
|
|
|
|
|
|
```{r}
|
|
|
|
|
sorted_bars <- function(df, var) {
|
|
|
|
|
@@ -748,7 +748,7 @@ foo <- function(x) {
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
I've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
|
|
|
|
We've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
|
|
|
|
|
|
|
|
|
```{r}
|
|
|
|
|
# https://twitter.com/yutannihilat_en/status/1574387230025875457
|
|
|
|
|
@@ -764,7 +764,7 @@ density(species)
|
|
|
|
|
density(island, sex)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Also note that I hardcoded the `x` variable but allowed the fill to vary.
|
|
|
|
|
Also note that we hardcoded the `x` variable but allowed the fill to vary.
|
|
|
|
|
|
|
|
|
|
```{r}
|
|
|
|
|
bars <- function(df, condition, var) {
|
|
|
|
|
|