This commit is contained in:
Hadley Wickham 2022-09-29 10:50:47 -05:00
parent faeeb564a4
commit be5905a09c
4 changed files with 17 additions and 17 deletions

View File

@ -32,9 +32,9 @@ The goal of this chapter is to get you started on your journey with functions wi
The chapter concludes with some advice on function style.
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
We've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
If you want just to see a huge variety of funcitons, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
WI won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
We've often simplified the code from the original so you might want to look at the original tweets which we list in the comments.
If you want just to see a huge variety of functions, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
WI won't fully explain all of the functions that we use here, so you might need to do some reading of the documentation.
### Prerequisites
@ -101,14 +101,14 @@ If we take the code above and pull it outside of `mutate()` it's a little easier
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
```
To make this a bit clearer I can replace the bit that varies with `█`:
To make this a bit clearer we can replace the bit that varies with `█`:
```{r}
#| eval: false
(█ - min(█, na.rm = TRUE)) / (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
```
There's only one thing that varies which implies I'm going to need a function with one argument.
There's only one thing that varies which implies we're going to need a function with one argument.
To turn this into an actual function you need three things:
@ -473,7 +473,7 @@ summary6 <- function(data, var) {
diamonds |> summary6(carat)
```
(Whenever you wrap `summarise()` in a helper, I think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
(Whenever you wrap `summarise()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
@ -563,7 +563,7 @@ We didn't discuss `pivot_wider()` above, but you can read the docs to discover t
### Selecting rows and columns
Or maybe you want to find the sorted unique values of a variable for a subset of the data.
Rather than supplying a variable and a value to do the filtering, I'll allow the user to supply an condition:
Rather than supplying a variable and a value to do the filtering, we'll allow the user to supply an condition:
```{r}
unique_where <- function(df, condition, var) {
@ -582,7 +582,7 @@ flights |> unique_where(tailnum == "N14228", month)
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()`, `arrange()`, and `pull()`.
I've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
For example, this function always works with the flights dataset, make it easy to grab the subset that you want to work with.
It always includes `time_hour`, `carrier`, and `flight` since these are the primary key that allows you to identify a row.
@ -682,7 +682,7 @@ diamonds |> hex_plot(carat, price, depth)
Some of the most useful helpers combine a dash of dplyr with ggplot2.
For example, if you might want to do a bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
And I'm drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
And we're drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
```{r}
sorted_bars <- function(df, var) {
@ -748,7 +748,7 @@ foo <- function(x) {
}
```
I've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
We've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
```{r}
# https://twitter.com/yutannihilat_en/status/1574387230025875457
@ -764,7 +764,7 @@ density(species)
density(island, sex)
```
Also note that I hardcoded the `x` variable but allowed the fill to vary.
Also note that we hardcoded the `x` variable but allowed the fill to vary.
```{r}
bars <- function(df, condition, var) {

View File

@ -595,7 +595,7 @@ write_csv(gapminder, "gapminder.csv")
unlink("gapminder.csv")
```
If you're working in a project, I'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
If your input data files change of over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run when ever one of the input files is modified.

View File

@ -921,7 +921,7 @@ parties <- tibble(
```
Now we can match each employee to their party.
This is a good place to use `unmatched = "error"` because I want to quickly find out if any employees didn't get assigned a party.
This is a good place to use `unmatched = "error"` because we want to quickly find out if any employees didn't get assigned a party.
```{r}
employees |>
@ -939,7 +939,7 @@ employees |>
x |> full_join(y, by = "key", keep = TRUE)
```
2. When finding if any party period overlapped with another party period I used `q < q` in the `join_by()`?
2. When finding if any party period overlapped with another party period we used `q < q` in the `join_by()`?
Why?
What happens if you remove this inequality?

View File

@ -58,12 +58,12 @@ not_cancelled |>
Instead of running your code expression-by-expression, you can also execute the complete script in one step with Cmd/Ctrl + Shift + S.
Doing this regularly is a great way to ensure that you've captured all the important parts of your code in the script.
I recommend that you always start your script with the packages that you need.
We recommend that you always start your script with the packages that you need.
That way, if you share your code with others, they can easily see which packages they need to install.
Note, however, that you should never include `install.packages()` in a script that you share.
It's very antisocial to change settings on someone else's computer!
When working through future chapters, I highly recommend starting in the script editor and practicing your keyboard shortcuts.
When working through future chapters, we highly recommend starting in the script editor and practicing your keyboard shortcuts.
Over time, sending code to the console in this way will become so natural that you won't even think about it.
### RStudio diagnostics
@ -333,7 +333,7 @@ You should **never** use absolute paths in your scripts, because they hinder sha
There's another important difference between operating systems: how you separate the components of the path.
Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows uses backslashes (e.g. `plots\diamonds.pdf`).
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
That makes life frustrating, so I recommend always using the Linux/Mac style with forward slashes.
That makes life frustrating, so we recommend always using the Linux/Mac style with forward slashes.
## Summary