Reduce size (#1349)
* Various fixes & changes to reduce size * Fix arrow issue * Hopefully eliminate 3 paragraphs from regexps * Make single diagram for new projects
This commit is contained in:
parent
33ad991f5d
commit
3530ce04d4
|
@ -107,7 +107,8 @@ For example, this code tells us the total number of checkouts per year:
|
|||
```{r}
|
||||
#| cache: true
|
||||
seattle_csv |>
|
||||
count(CheckoutYear, wt = Checkouts) |>
|
||||
group_by(CheckoutYear) |>
|
||||
summarise(Checkouts = sum(Checkouts)) |>
|
||||
arrange(CheckoutYear) |>
|
||||
collect()
|
||||
```
|
||||
|
|
|
@ -518,9 +518,9 @@ Here's a quick example from the diamonds dataset:
|
|||
|
||||
```{r}
|
||||
#| dev: png
|
||||
#| layout-ncol: 2
|
||||
|
||||
hist(diamonds$carat)
|
||||
|
||||
plot(diamonds$carat, diamonds$price)
|
||||
```
|
||||
|
||||
|
|
|
@ -32,7 +32,5 @@ Communication is the theme of the following three chapters:
|
|||
|
||||
- In @sec-quarto-formats, you'll learn a little about the many other varieties of outputs you can produce using Quarto, including dashboards, websites, and books.
|
||||
|
||||
- We'll finish up with @sec-quarto-workflow, where you'll learn about the "analysis notebook" and how to systematically record your successes and failures so that you can learn from them.
|
||||
|
||||
These chapters focus mostly on the technical mechanics of communication, not the really hard problems of communicating your thoughts to other humans.
|
||||
However, there are lot of other great books about communication, which we'll point you to at the end of each chapter.
|
||||
|
|
Binary file not shown.
Binary file not shown.
After Width: | Height: | Size: 682 KiB |
|
@ -345,6 +345,7 @@ gss_cat |>
|
|||
To combine groups, you can assign multiple old levels to the same new level:
|
||||
|
||||
```{r}
|
||||
#| results: false
|
||||
gss_cat |>
|
||||
mutate(
|
||||
partyid = fct_recode(partyid,
|
||||
|
@ -358,8 +359,7 @@ gss_cat |>
|
|||
"Other" = "Don't know",
|
||||
"Other" = "Other party"
|
||||
)
|
||||
) |>
|
||||
count(partyid)
|
||||
)
|
||||
```
|
||||
|
||||
Use this technique with care: if you group together categories that are truly different you will end up with misleading results.
|
||||
|
@ -396,8 +396,7 @@ Instead, we can use the `fct_lump_n()` to specify that we want exactly 10 groups
|
|||
```{r}
|
||||
gss_cat |>
|
||||
mutate(relig = fct_lump_n(relig, n = 10)) |>
|
||||
count(relig, sort = TRUE) |>
|
||||
print(n = Inf)
|
||||
count(relig, sort = TRUE)
|
||||
```
|
||||
|
||||
Read the documentation to learn about `fct_lump_min()` and `fct_lump_prop()` which are useful in other cases.
|
||||
|
|
|
@ -329,9 +329,8 @@ It will continue to evolve in between reprints of the physical book.
|
|||
The source of the book is available at <https://github.com/hadley/r4ds>.
|
||||
The book is powered by [Quarto](https://quarto.org), which makes it easy to write books that combine text and executable code.
|
||||
|
||||
This book was built with:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
#| echo: false
|
||||
#| results: asis
|
||||
|
||||
|
|
57
regexps.qmd
57
regexps.qmd
|
@ -51,7 +51,7 @@ str_view(fruit, "berry")
|
|||
```
|
||||
|
||||
Letters and numbers match exactly and are called **literal characters**.
|
||||
Most punctuation characters, like `.`, `+`, `*`, `[`, `],` and `?,` have special meanings[^regexps-2] and are called **meta-characters**. For example, `.`
|
||||
Most punctuation characters, like `.`, `+`, `*`, `[`, `],` and `?`, have special meanings[^regexps-2] and are called **meta-characters**. For example, `.`
|
||||
will match any character[^regexps-3], so `"a."` will match any string that contains an "a" followed by another character
|
||||
:
|
||||
|
||||
|
@ -152,23 +152,20 @@ babynames |>
|
|||
geom_line()
|
||||
```
|
||||
|
||||
There are two functions that are closely related to `str_detect()`, namely `str_subset()` which returns just the strings that contain a match and `str_which()` which returns the indexes of strings that have a match:
|
||||
|
||||
```{r}
|
||||
str_subset(c("a", "b", "c"), "[aeiou]")
|
||||
str_which(c("a", "b", "c"), "[aeiou]")
|
||||
```
|
||||
There are two functions that are closely related to `str_detect()`: `str_subset()` and `str_which()`.
|
||||
`str_subset()` returns a character vector containing only the strings that match.
|
||||
`str_which()` returns an integer vector giving the positions of the strings that match.
|
||||
|
||||
### Count matches
|
||||
|
||||
The next step up in complexity from `str_detect()` is `str_count()`: rather than a simple true or false, it tells you how many matches there are in each string.
|
||||
The next step up in complexity from `str_detect()` is `str_count()`: rather than a true or false, it tells you how many matches there are in each string.
|
||||
|
||||
```{r}
|
||||
x <- c("apple", "banana", "pear")
|
||||
str_count(x, "p")
|
||||
```
|
||||
|
||||
Note that each match starts at the end of the previous match; i.e. regex matches never overlap.
|
||||
Note that each match starts at the end of the previous match, i.e. regex matches never overlap.
|
||||
For example, in `"abababa"`, how many times will the pattern `"aba"` match?
|
||||
Regular expressions say two, not three:
|
||||
|
||||
|
@ -222,7 +219,7 @@ x <- c("apple", "pear", "banana")
|
|||
str_replace_all(x, "[aeiou]", "-")
|
||||
```
|
||||
|
||||
`str_remove()` and `str_remove_all()` are handy shortcuts for `str_replace(x, pattern, "")`.
|
||||
`str_remove()` and `str_remove_all()` are handy shortcuts for `str_replace(x, pattern, "")`:
|
||||
|
||||
```{r}
|
||||
x <- c("apple", "pear", "banana")
|
||||
|
@ -303,13 +300,14 @@ They're not always the most evocative of their purpose, but it's very helpful to
|
|||
|
||||
### Escaping {#sec-regexp-escaping}
|
||||
|
||||
In order to match a literal `.`, you need an **escape** which tells the regular expression to match metacharacters literally.
|
||||
In order to match a literal `.`, you need an **escape** which tells the regular expression to match metacharacters[^regexps-6] literally.
|
||||
Like strings, regexps use the backslash for escaping.
|
||||
So, to match a `.`, you need the regexp `\.`.
|
||||
Unfortunately this creates a problem.
|
||||
So, to match a `.`, you need the regexp `\.`. Unfortunately this creates a problem.
|
||||
We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings.
|
||||
So to create the regular expression `\.` we need the string `"\\."`, as the following example shows.
|
||||
|
||||
[^regexps-6]: The complete set of metacharacters is `.^$\|*+?{}[]()`
|
||||
|
||||
```{r}
|
||||
# To create the regular expression \., we need to use \\.
|
||||
dot <- "\\."
|
||||
|
@ -350,20 +348,17 @@ str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")
|
|||
str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")
|
||||
```
|
||||
|
||||
The full set of metacharacters is `.^$\|*+?{}[]()`.
|
||||
In general, look at punctuation characters with suspicion; if your regular expression isn't matching what you think it should, check if you've used any of these characters.
|
||||
|
||||
### Anchors
|
||||
|
||||
By default, regular expressions will match any part of a string.
|
||||
If you want to match at the start of end you need to **anchor** the regular expression using `^` to match the start of the string or `$` to match the end of the string:
|
||||
If you want to match at the start of end you need to **anchor** the regular expression using `^` to match the start or `$` to match the end:
|
||||
|
||||
```{r}
|
||||
str_view(fruit, "^a")
|
||||
str_view(fruit, "a$")
|
||||
```
|
||||
|
||||
It's tempting to think that `$` should match the start of a string, because that's how we write dollar amounts, but it's not what regular expressions want.
|
||||
It's tempting to think that `$` should match the start of a string, because that's how we write dollar amounts, but that's not what regular expressions want.
|
||||
|
||||
To force a regular expression to match only the full string, anchor it with both `^` and `$`:
|
||||
|
||||
|
@ -398,7 +393,7 @@ str_replace_all("abc", c("$", "^", "\\b"), "--")
|
|||
|
||||
A **character class**, or character **set**, allows you to match any character in a set.
|
||||
As we discussed above, you can construct your own sets with `[]`, where `[abc]` matches "a", "b", or "c" and `[^abc]` matches any character except "a", "b", or "c".
|
||||
Apart from `^` there are two ther characters that have special meaning inside of `[]:`
|
||||
Apart from `^` there are two other characters that have special meaning inside of `[]:`
|
||||
|
||||
- `-` defines a range, e.g. `[a-z]` matches any lower case letter and `[0-9]` matches any number.
|
||||
- `\` escapes special characters, so `[\^\-\]]` matches `^`, `-`, or `]`.
|
||||
|
@ -419,9 +414,9 @@ str_view("a-b-c", "[a\\-c]")
|
|||
|
||||
Some character classes are used so commonly that they get their own shortcut.
|
||||
You've already seen `.`, which matches any character apart from a newline.
|
||||
There are three other particularly useful pairs[^regexps-6]:
|
||||
There are three other particularly useful pairs[^regexps-7]:
|
||||
|
||||
[^regexps-6]: Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.
|
||||
[^regexps-7]: Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.
|
||||
|
||||
- `\d` matches any digit;\
|
||||
`\D` matches anything that isn't a digit.
|
||||
|
@ -453,18 +448,6 @@ You can also specify the number of matches precisely with `{}`:
|
|||
- `{n,}` matches at least n times.
|
||||
- `{n,m}` matches between n and m times.
|
||||
|
||||
The following code shows how this works for a few simple examples:
|
||||
|
||||
```{r}
|
||||
x <- "-- -x- -xx- -xxx- -xxxx- -xxxxx-"
|
||||
str_view(x, "-x?-") # [0, 1]
|
||||
str_view(x, "-x+-") # [1, Inf)
|
||||
str_view(x, "-x*-") # [0, Inf)
|
||||
str_view(x, "-x{2}-") # [2. 2]
|
||||
str_view(x, "-x{2,}-") # [2, Inf)
|
||||
str_view(x, "-x{2,3}-") # [2, 3]
|
||||
```
|
||||
|
||||
### Operator precedence and parentheses
|
||||
|
||||
What does `ab+` match?
|
||||
|
@ -506,9 +489,9 @@ sentences |>
|
|||
```
|
||||
|
||||
If you want extract the matches for each group you can use `str_match()`.
|
||||
But `str_match()` returns a matrix, so it's not particularly easy to work with[^regexps-7]:
|
||||
But `str_match()` returns a matrix, so it's not particularly easy to work with[^regexps-8]:
|
||||
|
||||
[^regexps-7]: Mostly because we never discuss matrices in this book!
|
||||
[^regexps-8]: Mostly because we never discuss matrices in this book!
|
||||
|
||||
```{r}
|
||||
sentences |>
|
||||
|
@ -610,9 +593,9 @@ If you're doing a lot of work with multiline strings (i.e. strings that contain
|
|||
|
||||
Finally, if you're writing a complicated regular expression and you're worried you might not understand it in the future, you might try `comments = TRUE`.
|
||||
It tweaks the pattern language to ignore spaces and new lines, as well as everything after `#`.
|
||||
This allows you to use comments and whitespace to make complex regular expressions more understandable[^regexps-8], as in the following example:
|
||||
This allows you to use comments and whitespace to make complex regular expressions more understandable[^regexps-9], as in the following example:
|
||||
|
||||
[^regexps-8]: `comments = TRUE` is particularly effective in combination with a raw string, as we use here.
|
||||
[^regexps-9]: `comments = TRUE` is particularly effective in combination with a raw string, as we use here.
|
||||
|
||||
```{r}
|
||||
phone <- regex(
|
||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 135 KiB |
Binary file not shown.
Before Width: | Height: | Size: 83 KiB |
Binary file not shown.
Before Width: | Height: | Size: 160 KiB |
|
@ -40,6 +40,6 @@ Five chapters focus on the tools of data science:
|
|||
In @sec-data-import you'll learn the basics of getting `.csv` files into R.
|
||||
|
||||
Nestled among these chapters are five other chapters that focus on your R workflow.
|
||||
In @sec-workflow-basics, @sec-workflow-pipes, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
|
||||
In @sec-workflow-basics, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
|
||||
These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects.
|
||||
Finally, @sec-workflow-getting-help will teach you how to get help to keep learning.
|
||||
Finally, @sec-workflow-getting-help will teach you how to get help and keep learning.
|
||||
|
|
|
@ -264,12 +264,9 @@ Click File \> New Project, then follow the steps shown in @fig-new-project.
|
|||
#| label: fig-new-project
|
||||
#| echo: false
|
||||
#| fig-cap: >
|
||||
#| Create a new project by following these three steps.
|
||||
#| fig-subcap:
|
||||
#| - First click New Directory.
|
||||
#| - Then click New Project.
|
||||
#| - Finally, fill in the directory (project) name, choose a good
|
||||
#| subdirectory for its home and click Create Project.
|
||||
#| To create new project: (top) first click New Directory, then (middle)
|
||||
#| click New Project, then (bottom) fill in the directory (project) name,
|
||||
#| choose a good subdirectory for its home and click Create Project.
|
||||
#| fig-alt: >
|
||||
#| Three screenshots of the New Project menu. In the first screenshot,
|
||||
#| the Create Project window is shown and New Directory is selected.
|
||||
|
@ -279,9 +276,7 @@ Click File \> New Project, then follow the steps shown in @fig-new-project.
|
|||
#| the project is being created as subdirectory of the Desktop.
|
||||
#| out-width: ~
|
||||
|
||||
knitr::include_graphics("screenshots/rstudio-project-1.png")
|
||||
knitr::include_graphics("screenshots/rstudio-project-2.png")
|
||||
knitr::include_graphics("screenshots/rstudio-project-3.png")
|
||||
knitr::include_graphics("diagrams/new-project.png")
|
||||
```
|
||||
|
||||
Call your project `r4ds` and think carefully about which subdirectory you put the project in.
|
||||
|
|
Loading…
Reference in New Issue