Update functions.qmd (#1178)
This commit is contained in:
		
							
								
								
									
										116
									
								
								functions.qmd
									
									
									
									
									
								
							
							
						
						
									
										116
									
								
								functions.qmd
									
									
									
									
									
								
							@@ -75,7 +75,7 @@ Preventing this type of mistake of is one very good reason to learn how to write
 | 
			
		||||
### Writing a function
 | 
			
		||||
 | 
			
		||||
To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary.
 | 
			
		||||
If we take the code above and pull it outside of `mutate()` it's a little easier to see the pattern because each repetition is now one line:
 | 
			
		||||
If we take the code above and pull it outside of `mutate()`, it's a little easier to see the pattern because each repetition is now one line:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| eval: false
 | 
			
		||||
@@ -99,11 +99,11 @@ To turn this into a function you need three things:
 | 
			
		||||
    Here we'll use `rescale01` because this function rescales a vector to lie between 0 and 1.
 | 
			
		||||
 | 
			
		||||
2.  The **arguments**.
 | 
			
		||||
    The arguments are things that vary across calls and our analysis above tells us that have just one.
 | 
			
		||||
    The arguments are things that vary across calls and our analysis above tells us that we have just one.
 | 
			
		||||
    We'll call it `x` because this is the conventional name for a numeric vector.
 | 
			
		||||
 | 
			
		||||
3.  The **body**.
 | 
			
		||||
    The body is the code that repeated across all the calls.
 | 
			
		||||
    The body is the code that's repeated across all the calls.
 | 
			
		||||
 | 
			
		||||
Then you create a function by following the template:
 | 
			
		||||
 | 
			
		||||
@@ -143,7 +143,7 @@ df |> mutate(
 | 
			
		||||
 | 
			
		||||
### Improving our function
 | 
			
		||||
 | 
			
		||||
You might notice `rescale01()` function does some unnecessary work --- instead of computing `min()` twice and `max()` once we could instead compute both the minimum and maximum in one step with `range()`:
 | 
			
		||||
You might notice that the `rescale01()` function does some unnecessary work --- instead of computing `min()` twice and `max()` once we could instead compute both the minimum and maximum in one step with `range()`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
rescale01 <- function(x) {
 | 
			
		||||
@@ -166,6 +166,7 @@ rescale01 <- function(x) {
 | 
			
		||||
  rng <- range(x, na.rm = TRUE, finite = TRUE)
 | 
			
		||||
  (x - rng[1]) / (rng[2] - rng[1])
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
rescale01(x)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -173,11 +174,11 @@ These changes illustrate an important benefit of functions: because we've moved
 | 
			
		||||
 | 
			
		||||
### Mutate functions
 | 
			
		||||
 | 
			
		||||
Now you've got the basic idea of functions, lets take a look a whole bunch of examples.
 | 
			
		||||
We'll start by looking at "mutate" functions, functions that work well like `mutate()` and `filter()` because they return an output the same length as the input.
 | 
			
		||||
Now you've got the basic idea of functions, let's take a look at a whole bunch of examples.
 | 
			
		||||
We'll start by looking at "mutate" functions, i.e. functions that work well inside of `mutate()` and `filter()` because they return an output of the same length as the input.
 | 
			
		||||
 | 
			
		||||
Lets start with a simple variation of `rescale01()`.
 | 
			
		||||
Maybe you want compute the Z-score, rescaling a vector to have to a mean of zero and a standard deviation of one:
 | 
			
		||||
Let's start with a simple variation of `rescale01()`.
 | 
			
		||||
Maybe you want to compute the Z-score, rescaling a vector to have a mean of zero and a standard deviation of one:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
z_score <- function(x) {
 | 
			
		||||
@@ -185,7 +186,7 @@ z_score <- function(x) {
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Or maybe you want to wrap up a straightforward `case_when()` in order to give it a useful name.
 | 
			
		||||
Or maybe you want to wrap up a straightforward `case_when()` and give it a useful name.
 | 
			
		||||
For example, this `clamp()` function ensures all values of a vector lie in between a minimum or a maximum:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -196,6 +197,7 @@ clamp <- function(x, min, max) {
 | 
			
		||||
    .default = x
 | 
			
		||||
  )
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
clamp(1:10, min = 3, max = 7)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -209,11 +211,12 @@ na_outside <- function(x, min, max) {
 | 
			
		||||
    .default = x
 | 
			
		||||
  )
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
na_outside(1:10, min = 3, max = 7)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Of course functions don't just need to work with numeric variables.
 | 
			
		||||
You might want to extract out some repeated string manipulation.
 | 
			
		||||
You might want to do some repeated string manipulation.
 | 
			
		||||
Maybe you need to make the first character upper case:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -221,6 +224,7 @@ first_upper <- function(x) {
 | 
			
		||||
  str_sub(x, 1, 1) <- str_to_upper(str_sub(x, 1, 1))
 | 
			
		||||
  x
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
first_upper("hello")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -237,11 +241,12 @@ clean_number <- function(x) {
 | 
			
		||||
    as.numeric(x)
 | 
			
		||||
  if_else(is_pct, num / 100, num)
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
clean_number("$12,300")
 | 
			
		||||
clean_number("45%")
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Sometimes your functions will be highly specialized for one data analysis.
 | 
			
		||||
Sometimes your functions will be highly specialized for one data analysis step.
 | 
			
		||||
For example, if you have a bunch of variables that record missing values as 997, 998, or 999, you might want to write a function to replace them with `NA`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -282,15 +287,17 @@ Sometimes this can just be a matter of setting a default argument or two:
 | 
			
		||||
commas <- function(x) {
 | 
			
		||||
  str_flatten(x, collapse = ", ", last = " and ")
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
commas(c("cat", "dog", "pigeon"))
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Or you might wrap up a simple computation, like for the coefficient of variation, which divides standard deviation by the mean:
 | 
			
		||||
Or you might wrap up a simple computation, like for the coefficient of variation, which divides the standard deviation by the mean:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
cv <- function(x, na.rm = FALSE) {
 | 
			
		||||
  sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
cv(runif(100, min = 0, max = 50))
 | 
			
		||||
cv(runif(100, min = 0, max = 500))
 | 
			
		||||
```
 | 
			
		||||
@@ -402,7 +409,7 @@ If we try and use it, we get an error:
 | 
			
		||||
diamonds |> grouped_mean(cut, carat)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
To make the problem a bit more clear we can use a made up data frame:
 | 
			
		||||
To make the problem a bit more clear, we can use a made up data frame:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
df <- tibble(
 | 
			
		||||
@@ -412,6 +419,7 @@ df <- tibble(
 | 
			
		||||
  x = 10,
 | 
			
		||||
  y = 100
 | 
			
		||||
)
 | 
			
		||||
 | 
			
		||||
df |> grouped_mean(group, x)
 | 
			
		||||
df |> grouped_mean(group, y)
 | 
			
		||||
```
 | 
			
		||||
@@ -428,7 +436,7 @@ Embracing a variable means to wrap it in braces so (e.g.) `var` becomes `{{ var
 | 
			
		||||
Embracing a variable tells dplyr to use the value stored inside the argument, not the argument as the literal variable name.
 | 
			
		||||
One way to remember what's happening is to think of `{{ }}` as looking down a tunnel --- `{{ var }}` will make a dplyr function look inside of `var` rather than looking for a variable called `var`.
 | 
			
		||||
 | 
			
		||||
So to make grouped_mean`()` work we need to replace surround `group_var` and `mean_var()` with `{{ }}`:
 | 
			
		||||
So to make grouped_mean`()` work, we need to surround `group_var` and `mean_var()` with `{{ }}`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
grouped_mean <- function(df, group_var, mean_var) {
 | 
			
		||||
@@ -445,16 +453,16 @@ Success!
 | 
			
		||||
### When to embrace? {#sec-embracing}
 | 
			
		||||
 | 
			
		||||
So the key challenge in writing data frame functions is figuring out which arguments need to be embraced.
 | 
			
		||||
Fortunately this is easy because you can look it up from the documentation 😄.
 | 
			
		||||
There are two terms to look for in the docs which corresponding to the two most common sub-types of tidy evaluation:
 | 
			
		||||
Fortunately, this is easy because you can look it up from the documentation 😄.
 | 
			
		||||
There are two terms to look for in the docs which correspond to the two most common sub-types of tidy evaluation:
 | 
			
		||||
 | 
			
		||||
-   **Data-masking**: this is used in functions like `arrange()`, `filter()`, and `summarize()` that compute with variables.
 | 
			
		||||
 | 
			
		||||
-   **Tidy-selection**: this is used for for functions like `select()`, `relocate()`, and `rename()` that select variables.
 | 
			
		||||
-   **Tidy-selection**: this is used for functions like `select()`, `relocate()`, and `rename()` that select variables.
 | 
			
		||||
 | 
			
		||||
Your intuition about which arguments use tidy evaluation should be good for many common functions --- just think about whether you can compute (e.g. `x + 1`) or select (e.g. `a:x`).
 | 
			
		||||
 | 
			
		||||
In the following sections we'll explore the sorts of handy functions you might write once you understand embracing.
 | 
			
		||||
In the following sections, we'll explore the sorts of handy functions you might write once you understand embracing.
 | 
			
		||||
 | 
			
		||||
### Common use cases
 | 
			
		||||
 | 
			
		||||
@@ -472,12 +480,13 @@ summary6 <- function(data, var) {
 | 
			
		||||
    .groups = "drop"
 | 
			
		||||
  )
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
diamonds |> summary6(carat)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
(Whenever you wrap `summarize()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
 | 
			
		||||
 | 
			
		||||
The nice thing about this function is because it wraps `summarize()` you can used it on grouped data:
 | 
			
		||||
The nice thing about this function is, because it wraps `summarize()`, you can use it on grouped data:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
diamonds |> 
 | 
			
		||||
@@ -485,7 +494,7 @@ diamonds |>
 | 
			
		||||
  summary6(carat)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Because the arguments to summarize are data-masking that also means that the `var` argument to `summary6()` is data-masking.
 | 
			
		||||
Furthermore, since the arguments to summarize are data-masking also means that the `var` argument to `summary6()` is data-masking.
 | 
			
		||||
That means you can also summarize computed variables:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -494,7 +503,7 @@ diamonds |>
 | 
			
		||||
  summary6(log10(carat))
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
To summarize multiple variables you'll need to wait until @sec-across, where you'll learn how to use `across()`.
 | 
			
		||||
To summarize multiple variables, you'll need to wait until @sec-across, where you'll learn how to use `across()`.
 | 
			
		||||
 | 
			
		||||
Another popular `summarize()` helper function is a version of `count()` that also computes proportions:
 | 
			
		||||
 | 
			
		||||
@@ -505,6 +514,7 @@ count_prop <- function(df, var, sort = FALSE) {
 | 
			
		||||
    count({{ var }}, sort = sort) |>
 | 
			
		||||
    mutate(prop = n / sum(n))
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
diamonds |> count_prop(clarity)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -527,9 +537,9 @@ flights |> unique_where(month == 12, dest)
 | 
			
		||||
flights |> unique_where(tailnum == "N14228", month)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()` and `arrange()`.
 | 
			
		||||
Here we embrace `condition` because it's passed to `filter()` and `var` because it's passed to `distinct()` and `arrange()`.
 | 
			
		||||
 | 
			
		||||
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data, it can make sense to hardcode it.
 | 
			
		||||
We've made all these examples to take a data frame as the first argument, but if you're working repeatedly with the same data, it can make sense to hardcode it.
 | 
			
		||||
For example, the following function always works with the flights dataset and always selects `time_hour`, `carrier`, and `flight` since they form the compound primary key that allows you to identify a row.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -555,12 +565,13 @@ count_missing <- function(df, group_vars, x_var) {
 | 
			
		||||
    group_by({{ group_vars }}) |> 
 | 
			
		||||
    summarize(n_miss = sum(is.na({{ x_var }})))
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
flights |> 
 | 
			
		||||
  count_missing(c(year, month, day), dep_time)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
This doesn't work because `group_by()` uses data-masking, not tidy-selection.
 | 
			
		||||
We can work around that problem by using the handy `pick()` which allows you to use use tidy-selection inside data-masking functions:
 | 
			
		||||
We can work around that problem by using the handy `pick()` function, which allows you to use tidy-selection inside data-masking functions:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
count_missing <- function(df, group_vars, x_var) {
 | 
			
		||||
@@ -568,6 +579,7 @@ count_missing <- function(df, group_vars, x_var) {
 | 
			
		||||
    group_by(pick({{ group_vars }})) |> 
 | 
			
		||||
    summarize(n_miss = sum(is.na({{ x_var }})))
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
flights |> 
 | 
			
		||||
  count_missing(c(year, month, day), dep_time)
 | 
			
		||||
```
 | 
			
		||||
@@ -587,6 +599,7 @@ count_wide <- function(data, rows, cols) {
 | 
			
		||||
      values_fill = 0
 | 
			
		||||
    )
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
diamonds |> count_wide(clarity, cut)
 | 
			
		||||
diamonds |> count_wide(c(clarity, color), cut)
 | 
			
		||||
```
 | 
			
		||||
@@ -595,9 +608,9 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  Using the datasets from nycflights13, write functions that:
 | 
			
		||||
1.  Using the datasets from nycflights13, write a function that:
 | 
			
		||||
 | 
			
		||||
    1.  Find all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
 | 
			
		||||
    1.  Finds all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
 | 
			
		||||
 | 
			
		||||
        ```{r}
 | 
			
		||||
        #| eval: false
 | 
			
		||||
@@ -632,7 +645,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
 | 
			
		||||
        weather |> standardise_time(sched_dep_time)
 | 
			
		||||
        ```
 | 
			
		||||
 | 
			
		||||
2.  For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
 | 
			
		||||
2.  For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-selection: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
 | 
			
		||||
 | 
			
		||||
3.  Generalize the following function so that you can supply any number of variables to count.
 | 
			
		||||
 | 
			
		||||
@@ -647,7 +660,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
 | 
			
		||||
## Plot functions
 | 
			
		||||
 | 
			
		||||
Instead of returning a data frame, you might want to return a plot.
 | 
			
		||||
Fortunately you can use the same techniques with ggplot2, because `aes()` is a data-masking function.
 | 
			
		||||
Fortunately, you can use the same techniques with ggplot2, because `aes()` is a data-masking function.
 | 
			
		||||
For example, imagine that you're making a lot of histograms:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -662,7 +675,7 @@ diamonds |>
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Wouldn't it be nice if you could wrap this up into a histogram function?
 | 
			
		||||
This is easy as once you know that `aes()` is a data-masking function so that you need to embrace:
 | 
			
		||||
This is easy as pie once you know that `aes()` is a data-masking function and you need to embrace:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
histogram <- function(df, var, binwidth = NULL) {
 | 
			
		||||
@@ -674,7 +687,7 @@ histogram <- function(df, var, binwidth = NULL) {
 | 
			
		||||
diamonds |> histogram(carat, 0.1)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Note that `histogram()` returns a ggplot2 plot, so that you can still add on additional components if you want.
 | 
			
		||||
Note that `histogram()` returns a ggplot2 plot, meaning you can still add on additional components if you want.
 | 
			
		||||
Just remember to switch from `|>` to `+`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
@@ -690,7 +703,6 @@ For example, maybe you want an easy way to eyeball whether or not a data set is
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
# https://twitter.com/tyler_js_smith/status/1574377116988104704
 | 
			
		||||
 | 
			
		||||
linearity_check <- function(df, x, y) {
 | 
			
		||||
  df |>
 | 
			
		||||
    ggplot(aes({{ x }}, {{ y }})) +
 | 
			
		||||
@@ -717,6 +729,7 @@ hex_plot <- function(df, x, y, z, bins = 20, fun = "mean") {
 | 
			
		||||
      fun = fun,
 | 
			
		||||
    )
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
diamonds |> hex_plot(carat, price, depth)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -724,7 +737,7 @@ diamonds |> hex_plot(carat, price, depth)
 | 
			
		||||
 | 
			
		||||
Some of the most useful helpers combine a dash of dplyr with ggplot2.
 | 
			
		||||
For example, if you might want to do a vertical bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
 | 
			
		||||
Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top:
 | 
			
		||||
Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top (also note the `:=` operator, which allows you to inject names with glue syntax on the left-hand side of `:=`; type: ?\`:=\` for more details):
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
sorted_bars <- function(df, var) {
 | 
			
		||||
@@ -733,10 +746,11 @@ sorted_bars <- function(df, var) {
 | 
			
		||||
    ggplot(aes(y = {{ var }})) + 
 | 
			
		||||
    geom_bar()
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
diamonds |> sorted_bars(cut)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Or you could maybe you want to make it easy to draw a bar plot just for a subset of the data:
 | 
			
		||||
Or maybe you want to make it easy to draw a bar plot just for a subset of the data:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
conditional_bars <- function(df, condition, var) {
 | 
			
		||||
@@ -749,20 +763,19 @@ conditional_bars <- function(df, condition, var) {
 | 
			
		||||
diamonds |> conditional_bars(cut == "Good", clarity)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You can also get creative and display data summaries in other way.
 | 
			
		||||
You can also get creative and display data summaries in other ways.
 | 
			
		||||
For example, this code uses the axis labels to display the highest value.
 | 
			
		||||
As you learn more about ggplot2, the power of your functions will continue to increase.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
# https://gist.github.com/GShotwell/b19ef520b6d56f61a830fabb3454965b
 | 
			
		||||
 | 
			
		||||
fancy_ts <- function(df, val, group) {
 | 
			
		||||
  labs <- df |> 
 | 
			
		||||
    group_by({{group}}) |> 
 | 
			
		||||
    summarize(breaks = max({{val}}))
 | 
			
		||||
    group_by({{ group }}) |> 
 | 
			
		||||
    summarize(breaks = max({{ val }}))
 | 
			
		||||
  
 | 
			
		||||
  df |> 
 | 
			
		||||
    ggplot(aes(date, {{val}}, group = {{group}}, color = {{group}})) +
 | 
			
		||||
    ggplot(aes(date, {{ val }}, group = {{ group }}, color = {{ group }})) +
 | 
			
		||||
    geom_path() +
 | 
			
		||||
    scale_y_continuous(
 | 
			
		||||
      breaks = labs$breaks, 
 | 
			
		||||
@@ -778,6 +791,7 @@ df <- tibble(
 | 
			
		||||
  dist4 = sort(rnorm(50, 15, 1)),
 | 
			
		||||
  date = seq.Date(as.Date("2022-01-01"), as.Date("2022-04-10"), by = "2 days")
 | 
			
		||||
)
 | 
			
		||||
 | 
			
		||||
df <- pivot_longer(df, cols = -date, names_to = "dist_name", values_to = "value")
 | 
			
		||||
 | 
			
		||||
fancy_ts(df, value, dist_name)
 | 
			
		||||
@@ -787,19 +801,19 @@ Next we'll discuss two more complicated cases: faceting and automatic labeling.
 | 
			
		||||
 | 
			
		||||
### Faceting
 | 
			
		||||
 | 
			
		||||
Unfortunately programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work.
 | 
			
		||||
so you have to learn a new syntax.
 | 
			
		||||
Unfortunately, programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work.
 | 
			
		||||
So you have to learn a new syntax.
 | 
			
		||||
When programming with facets, instead of writing `~ x`, you need to write `vars(x)` and instead of `~ x + y` you need to write `vars(x, y)`.
 | 
			
		||||
The only advantage of this syntax is that `vars()` uses tidy evaluation so you can embrace within it:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
# https://twitter.com/sharoz/status/1574376332821204999
 | 
			
		||||
 | 
			
		||||
foo <- function(x) {
 | 
			
		||||
  ggplot(mtcars, aes(mpg, disp)) +
 | 
			
		||||
    geom_point() +
 | 
			
		||||
    facet_wrap(vars({{ x }}))
 | 
			
		||||
}
 | 
			
		||||
 | 
			
		||||
foo(cyl)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
@@ -833,12 +847,12 @@ histogram <- function(df, var, binwidth = NULL) {
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Wouldn't it be nice if we could label the output with the variable and the bin width that was used?
 | 
			
		||||
To do so, we're going to have to go under the covers of tidy evaluation and use a function from package we haven't talked about before: rlang.
 | 
			
		||||
To do so, we're going to have to go under the covers of tidy evaluation and use a function from the package we haven't talked about yet: rlang.
 | 
			
		||||
rlang is a low-level package that's used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).
 | 
			
		||||
 | 
			
		||||
To solve the labeling problem we can use `rlang::englue()`.
 | 
			
		||||
This works similarly to `str_glue()`, so any value wrapped in `{ }` will be inserted into the string.
 | 
			
		||||
But it also understands `{{ }}`, which automatically insert the appropriate variable name:
 | 
			
		||||
But it also understands `{{ }}`, which automatically inserts the appropriate variable name:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
histogram <- function(df, var, binwidth) {
 | 
			
		||||
@@ -853,16 +867,17 @@ histogram <- function(df, var, binwidth) {
 | 
			
		||||
diamonds |> histogram(carat, 0.1)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You can use the same approach any other place that you might supply a string in a ggplot2 plot.
 | 
			
		||||
You can use the same approach in any other place where you want to supply a string in a ggplot2 plot.
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  Build up a rich plotting function by incrementally implementing each of the steps below.
 | 
			
		||||
    1.  Draw a scatterplot given dataset and `x` and `y` variables.
 | 
			
		||||
Build up a rich plotting function by incrementally implementing each of the steps below:
 | 
			
		||||
 | 
			
		||||
    2.  Add a line of best fit (i.e. a linear model with no standard errors).
 | 
			
		||||
1.  Draw a scatterplot given dataset and `x` and `y` variables.
 | 
			
		||||
 | 
			
		||||
    3.  Add a title.
 | 
			
		||||
2.  Add a line of best fit (i.e. a linear model with no standard errors).
 | 
			
		||||
 | 
			
		||||
3.  Add a title.
 | 
			
		||||
 | 
			
		||||
## Style
 | 
			
		||||
 | 
			
		||||
@@ -923,6 +938,7 @@ This makes it very obvious that something unusual is happening.
 | 
			
		||||
    f1 <- function(string, prefix) {
 | 
			
		||||
      substr(string, 1, nchar(prefix)) == prefix
 | 
			
		||||
    }
 | 
			
		||||
    
 | 
			
		||||
    f3 <- function(x, y) {
 | 
			
		||||
      rep(y, length.out = length(x))
 | 
			
		||||
    }
 | 
			
		||||
@@ -935,8 +951,8 @@ This makes it very obvious that something unusual is happening.
 | 
			
		||||
 | 
			
		||||
## Summary
 | 
			
		||||
 | 
			
		||||
In this chapter you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot.
 | 
			
		||||
Along the way your saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.
 | 
			
		||||
In this chapter, you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot.
 | 
			
		||||
Along the way you saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.
 | 
			
		||||
 | 
			
		||||
We have only shown you the bare minimum to get started with functions and there's much more to learn.
 | 
			
		||||
A few places to learn more are:
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user