Reduce size (#1349)

* Various fixes & changes to reduce size

* Fix arrow issue

* Hopefully eliminate 3 paragraphs from regexps

* Make single diagram for new projects
This commit is contained in:
Hadley Wickham 2023-03-08 21:02:03 -06:00 committed by GitHub
parent 33ad991f5d
commit 3530ce04d4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 33 additions and 58 deletions

View File

@ -107,7 +107,8 @@ For example, this code tells us the total number of checkouts per year:
```{r} ```{r}
#| cache: true #| cache: true
seattle_csv |> seattle_csv |>
count(CheckoutYear, wt = Checkouts) |> group_by(CheckoutYear) |>
summarise(Checkouts = sum(Checkouts)) |>
arrange(CheckoutYear) |> arrange(CheckoutYear) |>
collect() collect()
``` ```

View File

@ -518,9 +518,9 @@ Here's a quick example from the diamonds dataset:
```{r} ```{r}
#| dev: png #| dev: png
#| layout-ncol: 2
hist(diamonds$carat) hist(diamonds$carat)
plot(diamonds$carat, diamonds$price) plot(diamonds$carat, diamonds$price)
``` ```

View File

@ -32,7 +32,5 @@ Communication is the theme of the following three chapters:
- In @sec-quarto-formats, you'll learn a little about the many other varieties of outputs you can produce using Quarto, including dashboards, websites, and books. - In @sec-quarto-formats, you'll learn a little about the many other varieties of outputs you can produce using Quarto, including dashboards, websites, and books.
- We'll finish up with @sec-quarto-workflow, where you'll learn about the "analysis notebook" and how to systematically record your successes and failures so that you can learn from them.
These chapters focus mostly on the technical mechanics of communication, not the really hard problems of communicating your thoughts to other humans. These chapters focus mostly on the technical mechanics of communication, not the really hard problems of communicating your thoughts to other humans.
However, there are lot of other great books about communication, which we'll point you to at the end of each chapter. However, there are lot of other great books about communication, which we'll point you to at the end of each chapter.

Binary file not shown.

BIN
diagrams/new-project.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 682 KiB

View File

@ -345,6 +345,7 @@ gss_cat |>
To combine groups, you can assign multiple old levels to the same new level: To combine groups, you can assign multiple old levels to the same new level:
```{r} ```{r}
#| results: false
gss_cat |> gss_cat |>
mutate( mutate(
partyid = fct_recode(partyid, partyid = fct_recode(partyid,
@ -358,8 +359,7 @@ gss_cat |>
"Other" = "Don't know", "Other" = "Don't know",
"Other" = "Other party" "Other" = "Other party"
) )
) |> )
count(partyid)
``` ```
Use this technique with care: if you group together categories that are truly different you will end up with misleading results. Use this technique with care: if you group together categories that are truly different you will end up with misleading results.
@ -396,8 +396,7 @@ Instead, we can use the `fct_lump_n()` to specify that we want exactly 10 groups
```{r} ```{r}
gss_cat |> gss_cat |>
mutate(relig = fct_lump_n(relig, n = 10)) |> mutate(relig = fct_lump_n(relig, n = 10)) |>
count(relig, sort = TRUE) |> count(relig, sort = TRUE)
print(n = Inf)
``` ```
Read the documentation to learn about `fct_lump_min()` and `fct_lump_prop()` which are useful in other cases. Read the documentation to learn about `fct_lump_min()` and `fct_lump_prop()` which are useful in other cases.

View File

@ -329,9 +329,8 @@ It will continue to evolve in between reprints of the physical book.
The source of the book is available at <https://github.com/hadley/r4ds>. The source of the book is available at <https://github.com/hadley/r4ds>.
The book is powered by [Quarto](https://quarto.org), which makes it easy to write books that combine text and executable code. The book is powered by [Quarto](https://quarto.org), which makes it easy to write books that combine text and executable code.
This book was built with:
```{r} ```{r}
#| eval: false
#| echo: false #| echo: false
#| results: asis #| results: asis

View File

@ -51,7 +51,7 @@ str_view(fruit, "berry")
``` ```
Letters and numbers match exactly and are called **literal characters**. Letters and numbers match exactly and are called **literal characters**.
Most punctuation characters, like `.`, `+`, `*`, `[`, `],` and `?,` have special meanings[^regexps-2] and are called **meta-characters**. For example, `.` Most punctuation characters, like `.`, `+`, `*`, `[`, `],` and `?`, have special meanings[^regexps-2] and are called **meta-characters**. For example, `.`
will match any character[^regexps-3], so `"a."` will match any string that contains an "a" followed by another character will match any character[^regexps-3], so `"a."` will match any string that contains an "a" followed by another character
: :
@ -152,23 +152,20 @@ babynames |>
geom_line() geom_line()
``` ```
There are two functions that are closely related to `str_detect()`, namely `str_subset()` which returns just the strings that contain a match and `str_which()` which returns the indexes of strings that have a match: There are two functions that are closely related to `str_detect()`: `str_subset()` and `str_which()`.
`str_subset()` returns a character vector containing only the strings that match.
```{r} `str_which()` returns an integer vector giving the positions of the strings that match.
str_subset(c("a", "b", "c"), "[aeiou]")
str_which(c("a", "b", "c"), "[aeiou]")
```
### Count matches ### Count matches
The next step up in complexity from `str_detect()` is `str_count()`: rather than a simple true or false, it tells you how many matches there are in each string. The next step up in complexity from `str_detect()` is `str_count()`: rather than a true or false, it tells you how many matches there are in each string.
```{r} ```{r}
x <- c("apple", "banana", "pear") x <- c("apple", "banana", "pear")
str_count(x, "p") str_count(x, "p")
``` ```
Note that each match starts at the end of the previous match; i.e. regex matches never overlap. Note that each match starts at the end of the previous match, i.e. regex matches never overlap.
For example, in `"abababa"`, how many times will the pattern `"aba"` match? For example, in `"abababa"`, how many times will the pattern `"aba"` match?
Regular expressions say two, not three: Regular expressions say two, not three:
@ -222,7 +219,7 @@ x <- c("apple", "pear", "banana")
str_replace_all(x, "[aeiou]", "-") str_replace_all(x, "[aeiou]", "-")
``` ```
`str_remove()` and `str_remove_all()` are handy shortcuts for `str_replace(x, pattern, "")`. `str_remove()` and `str_remove_all()` are handy shortcuts for `str_replace(x, pattern, "")`:
```{r} ```{r}
x <- c("apple", "pear", "banana") x <- c("apple", "pear", "banana")
@ -303,13 +300,14 @@ They're not always the most evocative of their purpose, but it's very helpful to
### Escaping {#sec-regexp-escaping} ### Escaping {#sec-regexp-escaping}
In order to match a literal `.`, you need an **escape** which tells the regular expression to match metacharacters literally. In order to match a literal `.`, you need an **escape** which tells the regular expression to match metacharacters[^regexps-6] literally.
Like strings, regexps use the backslash for escaping. Like strings, regexps use the backslash for escaping.
So, to match a `.`, you need the regexp `\.`. So, to match a `.`, you need the regexp `\.`. Unfortunately this creates a problem.
Unfortunately this creates a problem.
We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings. We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings.
So to create the regular expression `\.` we need the string `"\\."`, as the following example shows. So to create the regular expression `\.` we need the string `"\\."`, as the following example shows.
[^regexps-6]: The complete set of metacharacters is `.^$\|*+?{}[]()`
```{r} ```{r}
# To create the regular expression \., we need to use \\. # To create the regular expression \., we need to use \\.
dot <- "\\." dot <- "\\."
@ -350,20 +348,17 @@ str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")
str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c") str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")
``` ```
The full set of metacharacters is `.^$\|*+?{}[]()`.
In general, look at punctuation characters with suspicion; if your regular expression isn't matching what you think it should, check if you've used any of these characters.
### Anchors ### Anchors
By default, regular expressions will match any part of a string. By default, regular expressions will match any part of a string.
If you want to match at the start of end you need to **anchor** the regular expression using `^` to match the start of the string or `$` to match the end of the string: If you want to match at the start of end you need to **anchor** the regular expression using `^` to match the start or `$` to match the end:
```{r} ```{r}
str_view(fruit, "^a") str_view(fruit, "^a")
str_view(fruit, "a$") str_view(fruit, "a$")
``` ```
It's tempting to think that `$` should match the start of a string, because that's how we write dollar amounts, but it's not what regular expressions want. It's tempting to think that `$` should match the start of a string, because that's how we write dollar amounts, but that's not what regular expressions want.
To force a regular expression to match only the full string, anchor it with both `^` and `$`: To force a regular expression to match only the full string, anchor it with both `^` and `$`:
@ -398,7 +393,7 @@ str_replace_all("abc", c("$", "^", "\\b"), "--")
A **character class**, or character **set**, allows you to match any character in a set. A **character class**, or character **set**, allows you to match any character in a set.
As we discussed above, you can construct your own sets with `[]`, where `[abc]` matches "a", "b", or "c" and `[^abc]` matches any character except "a", "b", or "c". As we discussed above, you can construct your own sets with `[]`, where `[abc]` matches "a", "b", or "c" and `[^abc]` matches any character except "a", "b", or "c".
Apart from `^` there are two ther characters that have special meaning inside of `[]:` Apart from `^` there are two other characters that have special meaning inside of `[]:`
- `-` defines a range, e.g. `[a-z]` matches any lower case letter and `[0-9]` matches any number. - `-` defines a range, e.g. `[a-z]` matches any lower case letter and `[0-9]` matches any number.
- `\` escapes special characters, so `[\^\-\]]` matches `^`, `-`, or `]`. - `\` escapes special characters, so `[\^\-\]]` matches `^`, `-`, or `]`.
@ -419,9 +414,9 @@ str_view("a-b-c", "[a\\-c]")
Some character classes are used so commonly that they get their own shortcut. Some character classes are used so commonly that they get their own shortcut.
You've already seen `.`, which matches any character apart from a newline. You've already seen `.`, which matches any character apart from a newline.
There are three other particularly useful pairs[^regexps-6]: There are three other particularly useful pairs[^regexps-7]:
[^regexps-6]: Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`. [^regexps-7]: Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.
- `\d` matches any digit;\ - `\d` matches any digit;\
`\D` matches anything that isn't a digit. `\D` matches anything that isn't a digit.
@ -453,18 +448,6 @@ You can also specify the number of matches precisely with `{}`:
- `{n,}` matches at least n times. - `{n,}` matches at least n times.
- `{n,m}` matches between n and m times. - `{n,m}` matches between n and m times.
The following code shows how this works for a few simple examples:
```{r}
x <- "-- -x- -xx- -xxx- -xxxx- -xxxxx-"
str_view(x, "-x?-") # [0, 1]
str_view(x, "-x+-") # [1, Inf)
str_view(x, "-x*-") # [0, Inf)
str_view(x, "-x{2}-") # [2. 2]
str_view(x, "-x{2,}-") # [2, Inf)
str_view(x, "-x{2,3}-") # [2, 3]
```
### Operator precedence and parentheses ### Operator precedence and parentheses
What does `ab+` match? What does `ab+` match?
@ -506,9 +489,9 @@ sentences |>
``` ```
If you want extract the matches for each group you can use `str_match()`. If you want extract the matches for each group you can use `str_match()`.
But `str_match()` returns a matrix, so it's not particularly easy to work with[^regexps-7]: But `str_match()` returns a matrix, so it's not particularly easy to work with[^regexps-8]:
[^regexps-7]: Mostly because we never discuss matrices in this book! [^regexps-8]: Mostly because we never discuss matrices in this book!
```{r} ```{r}
sentences |> sentences |>
@ -610,9 +593,9 @@ If you're doing a lot of work with multiline strings (i.e. strings that contain
Finally, if you're writing a complicated regular expression and you're worried you might not understand it in the future, you might try `comments = TRUE`. Finally, if you're writing a complicated regular expression and you're worried you might not understand it in the future, you might try `comments = TRUE`.
It tweaks the pattern language to ignore spaces and new lines, as well as everything after `#`. It tweaks the pattern language to ignore spaces and new lines, as well as everything after `#`.
This allows you to use comments and whitespace to make complex regular expressions more understandable[^regexps-8], as in the following example: This allows you to use comments and whitespace to make complex regular expressions more understandable[^regexps-9], as in the following example:
[^regexps-8]: `comments = TRUE` is particularly effective in combination with a raw string, as we use here. [^regexps-9]: `comments = TRUE` is particularly effective in combination with a raw string, as we use here.
```{r} ```{r}
phone <- regex( phone <- regex(

Binary file not shown.

Before

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 160 KiB

View File

@ -40,6 +40,6 @@ Five chapters focus on the tools of data science:
In @sec-data-import you'll learn the basics of getting `.csv` files into R. In @sec-data-import you'll learn the basics of getting `.csv` files into R.
Nestled among these chapters are five other chapters that focus on your R workflow. Nestled among these chapters are five other chapters that focus on your R workflow.
In @sec-workflow-basics, @sec-workflow-pipes, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code. In @sec-workflow-basics, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects. These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects.
Finally, @sec-workflow-getting-help will teach you how to get help to keep learning. Finally, @sec-workflow-getting-help will teach you how to get help and keep learning.

View File

@ -264,12 +264,9 @@ Click File \> New Project, then follow the steps shown in @fig-new-project.
#| label: fig-new-project #| label: fig-new-project
#| echo: false #| echo: false
#| fig-cap: > #| fig-cap: >
#| Create a new project by following these three steps. #| To create new project: (top) first click New Directory, then (middle)
#| fig-subcap: #| click New Project, then (bottom) fill in the directory (project) name,
#| - First click New Directory. #| choose a good subdirectory for its home and click Create Project.
#| - Then click New Project.
#| - Finally, fill in the directory (project) name, choose a good
#| subdirectory for its home and click Create Project.
#| fig-alt: > #| fig-alt: >
#| Three screenshots of the New Project menu. In the first screenshot, #| Three screenshots of the New Project menu. In the first screenshot,
#| the Create Project window is shown and New Directory is selected. #| the Create Project window is shown and New Directory is selected.
@ -279,9 +276,7 @@ Click File \> New Project, then follow the steps shown in @fig-new-project.
#| the project is being created as subdirectory of the Desktop. #| the project is being created as subdirectory of the Desktop.
#| out-width: ~ #| out-width: ~
knitr::include_graphics("screenshots/rstudio-project-1.png") knitr::include_graphics("diagrams/new-project.png")
knitr::include_graphics("screenshots/rstudio-project-2.png")
knitr::include_graphics("screenshots/rstudio-project-3.png")
``` ```
Call your project `r4ds` and think carefully about which subdirectory you put the project in. Call your project `r4ds` and think carefully about which subdirectory you put the project in.