I -> we
This commit is contained in:
parent
faeeb564a4
commit
be5905a09c
|
@ -32,9 +32,9 @@ The goal of this chapter is to get you started on your journey with functions wi
|
|||
The chapter concludes with some advice on function style.
|
||||
|
||||
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
|
||||
We've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
|
||||
If you want just to see a huge variety of funcitons, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
||||
WI won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
|
||||
We've often simplified the code from the original so you might want to look at the original tweets which we list in the comments.
|
||||
If you want just to see a huge variety of functions, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
||||
WI won't fully explain all of the functions that we use here, so you might need to do some reading of the documentation.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -101,14 +101,14 @@ If we take the code above and pull it outside of `mutate()` it's a little easier
|
|||
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
|
||||
```
|
||||
|
||||
To make this a bit clearer I can replace the bit that varies with `█`:
|
||||
To make this a bit clearer we can replace the bit that varies with `█`:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
(█ - min(█, na.rm = TRUE)) / (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
|
||||
```
|
||||
|
||||
There's only one thing that varies which implies I'm going to need a function with one argument.
|
||||
There's only one thing that varies which implies we're going to need a function with one argument.
|
||||
|
||||
To turn this into an actual function you need three things:
|
||||
|
||||
|
@ -473,7 +473,7 @@ summary6 <- function(data, var) {
|
|||
diamonds |> summary6(carat)
|
||||
```
|
||||
|
||||
(Whenever you wrap `summarise()` in a helper, I think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
||||
(Whenever you wrap `summarise()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
||||
|
||||
The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
|
||||
|
||||
|
@ -563,7 +563,7 @@ We didn't discuss `pivot_wider()` above, but you can read the docs to discover t
|
|||
### Selecting rows and columns
|
||||
|
||||
Or maybe you want to find the sorted unique values of a variable for a subset of the data.
|
||||
Rather than supplying a variable and a value to do the filtering, I'll allow the user to supply an condition:
|
||||
Rather than supplying a variable and a value to do the filtering, we'll allow the user to supply an condition:
|
||||
|
||||
```{r}
|
||||
unique_where <- function(df, condition, var) {
|
||||
|
@ -582,7 +582,7 @@ flights |> unique_where(tailnum == "N14228", month)
|
|||
|
||||
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()`, `arrange()`, and `pull()`.
|
||||
|
||||
I've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
||||
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
||||
For example, this function always works with the flights dataset, make it easy to grab the subset that you want to work with.
|
||||
It always includes `time_hour`, `carrier`, and `flight` since these are the primary key that allows you to identify a row.
|
||||
|
||||
|
@ -682,7 +682,7 @@ diamonds |> hex_plot(carat, price, depth)
|
|||
|
||||
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
||||
For example, if you might want to do a bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
||||
And I'm drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
||||
And we're drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
||||
|
||||
```{r}
|
||||
sorted_bars <- function(df, var) {
|
||||
|
@ -748,7 +748,7 @@ foo <- function(x) {
|
|||
}
|
||||
```
|
||||
|
||||
I've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
||||
We've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
||||
|
||||
```{r}
|
||||
# https://twitter.com/yutannihilat_en/status/1574387230025875457
|
||||
|
@ -764,7 +764,7 @@ density(species)
|
|||
density(island, sex)
|
||||
```
|
||||
|
||||
Also note that I hardcoded the `x` variable but allowed the fill to vary.
|
||||
Also note that we hardcoded the `x` variable but allowed the fill to vary.
|
||||
|
||||
```{r}
|
||||
bars <- function(df, condition, var) {
|
||||
|
|
|
@ -595,7 +595,7 @@ write_csv(gapminder, "gapminder.csv")
|
|||
unlink("gapminder.csv")
|
||||
```
|
||||
|
||||
If you're working in a project, I'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
|
||||
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
|
||||
|
||||
If your input data files change of over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run when ever one of the input files is modified.
|
||||
|
||||
|
|
|
@ -921,7 +921,7 @@ parties <- tibble(
|
|||
```
|
||||
|
||||
Now we can match each employee to their party.
|
||||
This is a good place to use `unmatched = "error"` because I want to quickly find out if any employees didn't get assigned a party.
|
||||
This is a good place to use `unmatched = "error"` because we want to quickly find out if any employees didn't get assigned a party.
|
||||
|
||||
```{r}
|
||||
employees |>
|
||||
|
@ -939,7 +939,7 @@ employees |>
|
|||
x |> full_join(y, by = "key", keep = TRUE)
|
||||
```
|
||||
|
||||
2. When finding if any party period overlapped with another party period I used `q < q` in the `join_by()`?
|
||||
2. When finding if any party period overlapped with another party period we used `q < q` in the `join_by()`?
|
||||
Why?
|
||||
What happens if you remove this inequality?
|
||||
|
||||
|
|
|
@ -58,12 +58,12 @@ not_cancelled |>
|
|||
Instead of running your code expression-by-expression, you can also execute the complete script in one step with Cmd/Ctrl + Shift + S.
|
||||
Doing this regularly is a great way to ensure that you've captured all the important parts of your code in the script.
|
||||
|
||||
I recommend that you always start your script with the packages that you need.
|
||||
We recommend that you always start your script with the packages that you need.
|
||||
That way, if you share your code with others, they can easily see which packages they need to install.
|
||||
Note, however, that you should never include `install.packages()` in a script that you share.
|
||||
It's very antisocial to change settings on someone else's computer!
|
||||
|
||||
When working through future chapters, I highly recommend starting in the script editor and practicing your keyboard shortcuts.
|
||||
When working through future chapters, we highly recommend starting in the script editor and practicing your keyboard shortcuts.
|
||||
Over time, sending code to the console in this way will become so natural that you won't even think about it.
|
||||
|
||||
### RStudio diagnostics
|
||||
|
@ -333,7 +333,7 @@ You should **never** use absolute paths in your scripts, because they hinder sha
|
|||
There's another important difference between operating systems: how you separate the components of the path.
|
||||
Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows uses backslashes (e.g. `plots\diamonds.pdf`).
|
||||
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
|
||||
That makes life frustrating, so I recommend always using the Linux/Mac style with forward slashes.
|
||||
That makes life frustrating, so we recommend always using the Linux/Mac style with forward slashes.
|
||||
|
||||
## Summary
|
||||
|
||||
|
|
Loading…
Reference in New Issue