fix minor plaintext typos and links (#1082)
This commit is contained in:
parent
6ce6c473cf
commit
279611af8a
|
@ -201,7 +201,7 @@ billboard |>
|
|||
What happens if a song is in the top 100 for less than 76 weeks?
|
||||
Take 2 Pac's "Baby Don't Cry", for example.
|
||||
The above output suggests that it was only the top 100 for 7 weeks, and all the remaining weeks are filled in with missing values.
|
||||
These `NA`s don't really represent unknown observations; they're forced to exist by the structure of the dataset[^data-tidy-1], so we can ask `pivot_longer` to get rid of them by setting `values_drop_na = TRUE`:
|
||||
These `NA`s don't really represent unknown observations; they're forced to exist by the structure of the dataset[^data-tidy-1], so we can ask `pivot_longer()` to get rid of them by setting `values_drop_na = TRUE`:
|
||||
|
||||
[^data-tidy-1]: We'll come back to this idea in [Chapter -@sec-missing-values].
|
||||
|
||||
|
@ -473,7 +473,7 @@ cms_patient_experience |>
|
|||
|
||||
The output doesn't look quite right; we still seem to have multiple rows for each organization.
|
||||
That's because, by default, `pivot_wider()` will attempt to preserve all the existing columns including `measure_title` which has six distinct observations for each organisations.
|
||||
To fix this problem we need to tell `pivot_wider()` which columns identify each row; in this case those are the variables starting with `org`:
|
||||
To fix this problem we need to tell `pivot_wider()` which columns identify each row; in this case those are the variables starting with `"org"`:
|
||||
|
||||
```{r}
|
||||
cms_patient_experience |>
|
||||
|
@ -650,7 +650,7 @@ This makes a data frame, because tibbles don't support row names[^data-tidy-2].
|
|||
|
||||
[^data-tidy-2]: tibbles don't use row names because they only work for a subset of important cases: when observations can be identified by a single character vector.
|
||||
|
||||
We're now ready to cluster with (e.g.) `kmeans():`
|
||||
We're now ready to cluster with (e.g.) `kmeans()`:
|
||||
|
||||
```{r}
|
||||
cluster <- stats::kmeans(col_year, centers = 6)
|
||||
|
|
|
@ -68,7 +68,7 @@ But before we discuss their individual differences, it's worth stating what they
|
|||
Because the first argument is a data frame and the output is a data frame, dplyr verbs work well with the pipe, `|>`.
|
||||
The pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`.
|
||||
The easiest way to pronounce the pipe is "then".
|
||||
That makes it possible to get a sense of the following code even though you haven't yet learnt the details:
|
||||
That makes it possible to get a sense of the following code even though you haven't yet learned the details:
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
|
@ -81,7 +81,7 @@ flights |>
|
|||
)
|
||||
```
|
||||
|
||||
The code starts with the flights dataset, then filters it, then groups it, then summarizes it.
|
||||
The code starts with the `flights` dataset, then filters it, then groups it, then summarizes it.
|
||||
We'll come back to the pipe and its alternatives in @sec-pipes.
|
||||
|
||||
dplyr's verbs are organised into four groups based on what they operate on: **rows**, **columns**, **groups**, or **tables**.
|
||||
|
@ -100,7 +100,7 @@ The first argument is the data frame.
|
|||
The second and subsequent arguments are the conditions that must be true to keep the row.
|
||||
For example, we could find all flights that arrived more than 120 minutes (two hours) late:
|
||||
|
||||
[^data-transform-1]: Later, you'll learn about the `slice_*()` family which allows you to choose rows based on their positions
|
||||
[^data-transform-1]: Later, you'll learn about the `slice_*()` family which allows you to choose rows based on their positions.
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -597,7 +597,7 @@ ggplot(delays, aes(n, delay)) +
|
|||
Not surprisingly, there is much greater variation in the average delay when there are few flights for a given plane.
|
||||
The shape of this plot is very characteristic: whenever you plot a mean (or other summary) vs. group size, you'll see that the variation decreases as the sample size increases[^data-transform-4].
|
||||
|
||||
[^data-transform-4]: \*cough\* the central limit theorem \*cough\*
|
||||
[^data-transform-4]: \*cough\* the central limit theorem \*cough\*.
|
||||
|
||||
When looking at this sort of plot, it's often useful to filter out the groups with the smallest numbers of observations, so you can see more of the pattern and less of the extreme variation in the smallest groups:
|
||||
|
||||
|
@ -664,4 +664,4 @@ batters |>
|
|||
arrange(desc(perf))
|
||||
```
|
||||
|
||||
You can find a good explanation of this problem and how to overcome it at <http://varianceexplained.org/r/empirical_bayes_baseball/> and <http://www.evanmiller.org/how-not-to-sort-by-average-rating.html>.
|
||||
You can find a good explanation of this problem and how to overcome it at <http://varianceexplained.org/r/empirical_bayes_baseball/> and <https://www.evanmiller.org/how-not-to-sort-by-average-rating.html>.
|
||||
|
|
|
@ -15,7 +15,7 @@ R has several systems for making graphs, but ggplot2 is one of the most elegant
|
|||
ggplot2 implements the **grammar of graphics**, a coherent system for describing and building graphs.
|
||||
With ggplot2, you can do more and faster by learning one system and applying it in many places.
|
||||
|
||||
If you'd like to learn more about the theoretical underpinnings of ggplot2, you might enjoy reading "The Layered Grammar of Graphics", <http://vita.had.co.nz/papers/layered-grammar.pdf>, the scientific paper that discusses the theoretical underpinnings..
|
||||
If you'd like to learn more about the theoretical underpinnings of ggplot2, you might enjoy reading "The Layered Grammar of Graphics", <https://vita.had.co.nz/papers/layered-grammar.pdf>, the scientific paper that discusses the theoretical underpinnings..
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -552,11 +552,11 @@ For instance, to make the plots above, you can use this code:
|
|||
```{r}
|
||||
#| eval: false
|
||||
|
||||
# left
|
||||
# Left
|
||||
ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy))
|
||||
|
||||
# right
|
||||
# Right
|
||||
ggplot(data = mpg) +
|
||||
geom_smooth(mapping = aes(x = displ, y = hwy))
|
||||
```
|
||||
|
@ -604,7 +604,7 @@ If this makes you excited, buckle up.
|
|||
You will learn how to place multiple geoms in the same plot very soon.
|
||||
|
||||
ggplot2 provides more than 40 geoms, and extension packages provide even more (see <https://exts.ggplot2.tidyverse.org/gallery/> for a sampling).
|
||||
The best way to get a comprehensive overview is the ggplot2 cheatsheet, which you can find at <http://rstudio.com/resources/cheatsheets>.
|
||||
The best way to get a comprehensive overview is the ggplot2 cheatsheet, which you can find at <https://rstudio.com/resources/cheatsheets>.
|
||||
To learn more about any single geom, use the help (e.g. `?geom_smooth`).
|
||||
|
||||
Many geoms, like `geom_smooth()`, use a single geometric object to display multiple rows of data.
|
||||
|
@ -931,7 +931,7 @@ However, there are three reasons why you might need to use a stat explicitly:
|
|||
|
||||
ggplot2 provides more than 20 stats for you to use.
|
||||
Each stat is a function, so you can get help in the usual way, e.g. `?stat_bin`.
|
||||
To see a complete list of stats, try the [ggplot2 cheatsheet](http://rstudio.com/resources/cheatsheets).
|
||||
To see a complete list of stats, try the [ggplot2 cheatsheet](https://rstudio.com/resources/cheatsheets).
|
||||
|
||||
### Exercises
|
||||
|
||||
|
|
|
@ -530,7 +530,7 @@ flights |>
|
|||
|
||||
The main thing to notice here is the syntax: SQL joins use sub-clauses of the `FROM` clause to bring in additional tables, using `ON` to define how the tables are related.
|
||||
|
||||
dplyr's names for these functions are so closely connected to SQL that you can easily guess the equivalent SQL for `inner_join()`, `right_join()`, and `full_join():`
|
||||
dplyr's names for these functions are so closely connected to SQL that you can easily guess the equivalent SQL for `inner_join()`, `right_join()`, and `full_join()`:
|
||||
|
||||
``` sql
|
||||
SELECT flights.*, "type", manufacturer, model, engines, seats, speed
|
||||
|
|
|
@ -13,7 +13,7 @@ Factors are used for categorical variables, variables that have a fixed and know
|
|||
They are also useful when you want to display character vectors in a non-alphabetical order.
|
||||
|
||||
If you want to learn more about factors after reading this chapter, we recommend reading Amelia McNamara and Nicholas Horton's paper, [*Wrangling categorical data in R*](https://peerj.com/preprints/3163/).
|
||||
This paper lays out some of the history discussed in [*stringsAsFactors: An unauthorized biography*](https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/) and [*stringsAsFactors = \<sigh\>*](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh), and compares the tidy approaches to categorical data outlined in this book with base R methods.
|
||||
This paper lays out some of the history discussed in [*stringsAsFactors: An unauthorized biography*](https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/) and [*stringsAsFactors = \<sigh\>*](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh), and compares the tidy approaches to categorical data outlined in this book with base R methods.
|
||||
An early version of the paper helped motivate and scope the forcats package; thanks Amelia & Nick!
|
||||
|
||||
### Prerequisites
|
||||
|
@ -108,7 +108,7 @@ levels(f2)
|
|||
## General Social Survey
|
||||
|
||||
For the rest of this chapter, we're going to use `forcats::gss_cat`.
|
||||
It's a sample of data from the [General Social Survey](http://gss.norc.org), a long-running US survey conducted by the independent research organization NORC at the University of Chicago.
|
||||
It's a sample of data from the [General Social Survey](https://gss.norc.org), a long-running US survey conducted by the independent research organization NORC at the University of Chicago.
|
||||
The survey has thousands of questions, so in `gss_cat` Hadley selected a handful that will illustrate some common challenges you'll encounter when working with factors.
|
||||
|
||||
```{r}
|
||||
|
|
|
@ -153,7 +153,7 @@ You'll need at least R 4.1.0 for this book.
|
|||
### RStudio
|
||||
|
||||
RStudio is an integrated development environment, or IDE, for R programming.
|
||||
Download and install it from <http://www.rstudio.com/download>.
|
||||
Download and install it from <https://www.rstudio.com/download>.
|
||||
RStudio is updated a couple of times a year.
|
||||
When a new version is available, RStudio will let you know.
|
||||
It's a good idea to upgrade regularly so you can take advantage of the latest and greatest features.
|
||||
|
@ -262,7 +262,7 @@ There are a few people we'd like to thank in particular, because they have spent
|
|||
|
||||
- Jenny Bryan and Lionel Henry for many helpful discussions around working with lists and list-columns.
|
||||
|
||||
- The three chapters on workflow were adapted (with permission), from <http://stat545.com/block002_hello-r-workspace-wd-project.html> by Jenny Bryan.
|
||||
- The three chapters on workflow were adapted (with permission), from <https://stat545.com/block002_hello-r-workspace-wd-project.html> by Jenny Bryan.
|
||||
|
||||
- Yihui Xie for his work on the [bookdown](https://github.com/rstudio/bookdown) package, and for tirelessly responding to my feature requests.
|
||||
|
||||
|
@ -295,7 +295,7 @@ cat(".\n")
|
|||
|
||||
## Colophon
|
||||
|
||||
An online version of this book is available at [http://r4ds.had.co.nz](http://r4ds.hadley.nz){.uri}.
|
||||
An online version of this book is available at [https://r4ds.had.co.nz](https://r4ds.hadley.nz){.uri}.
|
||||
It will continue to evolve in between reprints of the physical book.
|
||||
The source of the book is available at <https://github.com/hadley/r4ds>.
|
||||
The book is powered by <https://bookdown.org> which makes it easy to turn R Markdown files into HTML, PDF, and EPUB.
|
||||
|
|
|
@ -268,7 +268,7 @@ y <- tribble(
|
|||
#| echo: false
|
||||
#| out-width: ~
|
||||
#| fig-cap: >
|
||||
#| Graphical representation of two simple tables
|
||||
#| Graphical representation of two simple tables.
|
||||
#| fig-alt: >
|
||||
#| x and y are two data frames with 2 columns and 3 rows each. The first
|
||||
#| column in each is the key and the second is the value. The contents of
|
||||
|
@ -489,7 +489,8 @@ planes |>
|
|||
|
||||
### Many-to-many joins
|
||||
|
||||
A **many-to-many** join arises when when both data frames have duplicate keys, as in @fig-join-many-to-many. When duplicated keys match, they generate all possible combinations, the Cartesian product.
|
||||
A **many-to-many** join arises when when both data frames have duplicate keys, as in @fig-join-many-to-many.
|
||||
When duplicated keys match, they generate all possible combinations, the Cartesian product.
|
||||
|
||||
```{r}
|
||||
#| label: fig-join-many-to-many
|
||||
|
@ -711,7 +712,7 @@ knitr::include_graphics("diagrams/join/following.png", dpi = 270)
|
|||
Rolling joins are a special type of inequality join where instead of getting *every* row that satisfies the inequality, you get one row.
|
||||
They're particularly useful when you have two tables of dates that don't perfectly line up and you want to find (e.g.) the closest date in table 1 that comes before (or after) some date in table 2.
|
||||
|
||||
There are two `joinby()` functions that perform rolling joins:
|
||||
There are two `join_by()` functions that perform rolling joins:
|
||||
|
||||
- `following(x, y)` is equivalent to getting the first match for `x <= y`.
|
||||
- `following(x, y, inclusive = FALSE)` is equivalent to getting the first match for `x < y`.
|
||||
|
|
|
@ -83,7 +83,7 @@ Sometimes you'll hit the opposite problem where some concrete value actually rep
|
|||
This typically arises in data generated by older software that doesn't have a proper way to represent missing values, so it must instead use some special value like 99 or -999.
|
||||
|
||||
If possible, handle this when reading in the data, for example, by using the `na` argument to `readr::read_csv()`.
|
||||
If you discover the problem later, or your data source doesn't provide a way to handle on it read, you can use `dplyr::na_if():`
|
||||
If you discover the problem later, or your data source doesn't provide a way to handle on it read, you can use `dplyr::na_if()`:
|
||||
|
||||
```{r}
|
||||
x <- c(1, 4, 5, 7, -99)
|
||||
|
@ -226,9 +226,9 @@ For example, imagine we have a dataset that contains some health information abo
|
|||
|
||||
```{r}
|
||||
health <- tibble(
|
||||
name = c("Ikaia", "Oletta", "Leriah", "Dashay", "Tresaun"),
|
||||
name = c("Ikaia", "Oletta", "Leriah", "Dashay", "Tresaun"),
|
||||
smoker = factor(c("no", "no", "no", "no", "no"), levels = c("yes", "no")),
|
||||
age = c(34L, 88L, 75L, 47L, 56L),
|
||||
age = c(34L, 88L, 75L, 47L, 56L),
|
||||
)
|
||||
```
|
||||
|
||||
|
@ -299,7 +299,7 @@ All summary functions work with zero-length vectors, but they may return results
|
|||
Here we see `mean(age)` returning `NaN` because `mean(age)` = `sum(age)/length(age)` which here is 0/0.
|
||||
`max()` and `min()` return -Inf and Inf for empty vectors so if you combine the results with a non-empty vector of new data and recompute you'll get the minimum or maximum of the new data[^missing-values-1].
|
||||
|
||||
[^missing-values-1]: In other words, `min(c(x, y))` is always equal to `min(min(x), min(y)).`
|
||||
[^missing-values-1]: In other words, `min(c(x, y))` is always equal to `min(min(x), min(y))`.
|
||||
|
||||
Sometimes a simpler approach is to perform the summary and then make the implicit missings explicit with `complete()`.
|
||||
|
||||
|
|
|
@ -98,7 +98,7 @@ There are a couple of variants of `n()` that you might find useful:
|
|||
```
|
||||
|
||||
- You can count missing values by combining `sum()` and `is.na()`.
|
||||
In the flights dataset this represents flights that are cancelled:
|
||||
In the `flights` dataset this represents flights that are cancelled:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
|
|
@ -209,7 +209,7 @@ When called without any additional arguments:
|
|||
parse_datetime("20101010")
|
||||
```
|
||||
|
||||
This is the most important date/time standard, and if you work with dates and times frequently, we recommend reading <https://en.wikipedia.org/wiki/ISO_8601>
|
||||
This is the most important date/time standard, and if you work with dates and times frequently, we recommend reading <https://en.wikipedia.org/wiki/ISO_8601>.
|
||||
|
||||
- `parse_date()` expects a four digit year, a `-` or `/`, the month, a `-` or `/`, then the day:
|
||||
|
||||
|
@ -376,7 +376,7 @@ readr contains a challenging CSV that illustrates both of these problems:
|
|||
challenge <- read_csv(readr_example("challenge.csv"))
|
||||
```
|
||||
|
||||
(Note the use of `readr_example()` which finds the path to one of the files included with the package)
|
||||
(Note the use of `readr_example()` which finds the path to one of the files included with the package.)
|
||||
|
||||
There are two printed outputs: the column specification generated by looking at the first 1000 rows, and the first five parsing failures.
|
||||
It's always a good idea to explicitly pull out the `problems()`, so you can explore them in more depth:
|
||||
|
|
|
@ -271,7 +271,7 @@ We get zero rows in the output, so the row effectively disappears.
|
|||
Once <https://github.com/tidyverse/tidyr/issues/1339> is fixed, you'll be able to keep this row, replacing `y` with `NA` by setting `keep_empty = TRUE`.
|
||||
|
||||
You can also unnest named list-columns, like `df1$y` into the rows.
|
||||
Because the elements are named, and those names might be useful data, puts them in a new column with the suffix`_id`:
|
||||
Because the elements are named, and those names might be useful data, puts them in a new column with the suffix `_id`:
|
||||
|
||||
```{r}
|
||||
df1 |>
|
||||
|
|
20
regexps.qmd
20
regexps.qmd
|
@ -159,9 +159,9 @@ A **character class**, or character **set**, allows you to match any character i
|
|||
The basic syntax lists each character you want to match inside of `[]`, so `[abc]` will match a, b, or c.
|
||||
Inside of `[]` only `-`, `^`, and `\` have special meanings:
|
||||
|
||||
- `-` defines a range. `[a-z]` matches any lower case letter and `[0-9]` matches any number.
|
||||
- `-` defines a range. `[a-z]`: matches any lower case letter and `[0-9]` matches any number.
|
||||
- `^` takes the inverse of the set. `[^abc]`: matches anything except a, b, or c.
|
||||
- `\` escapes special characters so `[\^\-\]]`: matches `^`, `-`, or `]`.
|
||||
- `\` escapes special characters, so `[\^\-\]]`: matches `^`, `-`, or `]`.
|
||||
|
||||
```{r}
|
||||
str_view_all("abcd12345-!@#%.", c("[abc]", "[a-z]", "[^a-z0-9]"))
|
||||
|
@ -180,11 +180,11 @@ You've already seen `.`, which matches any character apart from a newline.
|
|||
There are three other particularly useful pairs:
|
||||
|
||||
- `\d`: matches any digit;\
|
||||
`\D` matches anything that isn't a digit.
|
||||
`\D`: matches anything that isn't a digit.
|
||||
- `\s`: matches any whitespace (e.g. space, tab, newline);\
|
||||
`\S` matches anything that isn't whitespace.
|
||||
- `\w` matches any "word" character, i.e. letters and numbers;\
|
||||
`\W`, matches any non-word character.
|
||||
`\S`: matches anything that isn't whitespace.
|
||||
- `\w`: matches any "word" character, i.e. letters and numbers;\
|
||||
`\W`: matches any "non-word" character.
|
||||
|
||||
Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.
|
||||
The following code demonstrates the different shortcuts with a selection of letters, numbers, and punctuation characters.
|
||||
|
@ -403,7 +403,7 @@ rgb <- c("red", "green", "blue")
|
|||
```
|
||||
|
||||
Well, we can!
|
||||
We'd just need to create the pattern from the vector using `str_c()` and `str_flatten()`
|
||||
We'd just need to create the pattern from the vector using `str_c()` and `str_flatten()`:
|
||||
|
||||
```{r}
|
||||
str_c("\\b(", str_flatten(rgb, "|"), ")\\b")
|
||||
|
@ -601,15 +601,15 @@ str_view_all(x, regex("^Line", multiline = TRUE))
|
|||
Finally, if you're writing a complicated regular expression and you're worried you might not understand it in the future, `comments = TRUE` can be extremely useful.
|
||||
It allows you to use comments and whitespace to make complex regular expressions more understandable.
|
||||
Spaces and new lines are ignored, as is everything after `#`.
|
||||
(Note that we use a raw string here to minimize the number of escapes needed)
|
||||
(Note that we use a raw string here to minimize the number of escapes needed.)
|
||||
|
||||
```{r}
|
||||
phone <- regex(r"(
|
||||
\(? # optional opening parens
|
||||
(\d{3}) # area code
|
||||
[)\ -]? # optional closing parens, space, or dash
|
||||
[)\ -]? # optional closing parens, space, or dash
|
||||
(\d{3}) # another three numbers
|
||||
[\ -]? # optional space or dash
|
||||
[\ -]? # optional space or dash
|
||||
(\d{3}) # three more numbers
|
||||
)", comments = TRUE)
|
||||
|
||||
|
|
|
@ -209,15 +209,15 @@ All the details are wrapped inside the package, so you don't need to worry about
|
|||
|
||||
There are many packages that provide htmlwidgets, including:
|
||||
|
||||
- **dygraphs**, [http://rstudio.github.io/dygraphs](http://rstudio.github.io/dygraphs/){.uri}, for interactive time series visualisations.
|
||||
- **dygraphs**, [https://rstudio.github.io/dygraphs](https://rstudio.github.io/dygraphs/){.uri}, for interactive time series visualisations.
|
||||
|
||||
- **DT**, [http://rstudio.github.io/DT/](http://rstudio.github.io/DT){.uri}, for interactive tables.
|
||||
- **DT**, [https://rstudio.github.io/DT/](https://rstudio.github.io/DT){.uri}, for interactive tables.
|
||||
|
||||
- **threejs**, [http://bwlewis.github.io/rthreejs](http://bwlewis.github.io/rthreejs/){.uri} for interactive 3d plots.
|
||||
- **threejs**, [https://bwlewis.github.io/rthreejs](https://bwlewis.github.io/rthreejs/){.uri} for interactive 3d plots.
|
||||
|
||||
- **DiagrammeR**, <http://rich-iannone.github.io/DiagrammeR> for diagrams (like flow charts and simple node-link diagrams).
|
||||
- **DiagrammeR**, <https://rich-iannone.github.io/DiagrammeR> for diagrams (like flow charts and simple node-link diagrams).
|
||||
|
||||
To learn more about htmlwidgets and see a more complete list of packages that provide them visit <http://www.htmlwidgets.org>.
|
||||
To learn more about htmlwidgets and see a more complete list of packages that provide them visit <https://www.htmlwidgets.org>.
|
||||
|
||||
### Shiny
|
||||
|
||||
|
@ -260,7 +260,7 @@ This introduces a logistical issue: Shiny apps need a Shiny server to be run onl
|
|||
When you run shiny apps on your own computer, shiny automatically sets up a shiny server for you, but you need a public facing shiny server if you want to publish this sort of interactivity online.
|
||||
That's the fundamental trade-off of shiny: you can do anything in a shiny document that you can do in R, but it requires someone to be running R.
|
||||
|
||||
Learn more about Shiny at <http://shiny.rstudio.com>.
|
||||
Learn more about Shiny at <https://shiny.rstudio.com>.
|
||||
|
||||
## Websites
|
||||
|
||||
|
@ -287,14 +287,14 @@ Other packages provide even more output formats:
|
|||
|
||||
- The **bookdown** package, <https://pkgs.rstudio.com/bookdown>, makes it easy to write books, like this one.
|
||||
To learn more, read [*Authoring Books with R Markdown*](https://bookdown.org/yihui/bookdown/), by Yihui Xie, which is, of course, written in bookdown.
|
||||
Visit <http://www.bookdown.org> to see other bookdown books written by the wider R community.
|
||||
Visit <https://www.bookdown.org> to see other bookdown books written by the wider R community.
|
||||
|
||||
- The **prettydoc** package, [https://prettydoc.statr.me](https://prettydoc.statr.me/){.uri}, provides lightweight document formats with a range of attractive themes.
|
||||
|
||||
- The **rticles** package, <https://pkgs.rstudio.com/rticles>, compiles a selection of formats tailored for specific scientific journals.
|
||||
|
||||
See <http://rmarkdown.rstudio.com/formats.html> for a list of even more formats.
|
||||
You can also create your own by following the instructions at <http://rmarkdown.rstudio.com/developer_custom_formats.html>.
|
||||
See <https://rmarkdown.rstudio.com/formats.html> for a list of even more formats.
|
||||
You can also create your own by following the instructions at <https://rmarkdown.rstudio.com/developer_custom_formats.html>.
|
||||
|
||||
## Learning more
|
||||
|
||||
|
|
|
@ -29,7 +29,7 @@ It:
|
|||
A lab notebook helps you share not only what you've done, but why you did it with your colleagues or lab mates.
|
||||
|
||||
Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks.
|
||||
We've drawn on our own experiences and Colin Purrington's advice on lab notebooks (<http://colinpurrington.com/tips/lab-notebooks>) to come up with the following tips:
|
||||
We've drawn on our own experiences and Colin Purrington's advice on lab notebooks (<https://colinpurrington.com/tips/lab-notebooks>) to come up with the following tips:
|
||||
|
||||
- Ensure each notebook has a descriptive title, an evocative filename, and a first paragraph that briefly describes the aims of the analysis.
|
||||
|
||||
|
|
|
@ -505,21 +505,21 @@ csl: apa.csl
|
|||
|
||||
As with the bibliography field, your csl file should contain a path to the file.
|
||||
Here we assume that the csl file is in the same directory as the .Rmd file.
|
||||
A good place to find CSL style files for common bibliography styles is <http://github.com/citation-style-language/styles>.
|
||||
A good place to find CSL style files for common bibliography styles is <https://github.com/citation-style-language/styles>.
|
||||
|
||||
## Learning more
|
||||
|
||||
R Markdown is still relatively young, and is still growing rapidly.
|
||||
The best place to stay on top of innovations is the official R Markdown website: <http://rmarkdown.rstudio.com>.
|
||||
The best place to stay on top of innovations is the official R Markdown website: <https://rmarkdown.rstudio.com>.
|
||||
|
||||
There are two important topics that we haven't covered here: collaboration, and the details of accurately communicating your ideas to other humans.
|
||||
Collaboration is a vital part of modern data science, and you can make your life much easier by using version control tools, like Git and GitHub.
|
||||
We recommend two free resources that will teach you about Git:
|
||||
|
||||
1. "Happy Git with R": a user friendly introduction to Git and GitHub from R users, by Jenny Bryan.
|
||||
The book is freely available online: <http://happygitwithr.com>
|
||||
The book is freely available online: <https://happygitwithr.com>
|
||||
|
||||
2. The "Git and GitHub" chapter of *R Packages*, by Hadley.
|
||||
2. The "Git and GitHub" chapter of *R Packages*, by Hadley Wickham.
|
||||
You can also read it for free online: <http://r-pkgs.had.co.nz/git.html>.
|
||||
|
||||
We have not touched on what you should actually write in order to clearly communicate the results of your analysis.
|
||||
|
|
|
@ -81,7 +81,7 @@ backslash <- "\\"
|
|||
Beware that the printed representation of a string is not the same as string itself, because the printed representation shows the escapes (in other words, when you print a string, you can copy and paste the output to recreate that string).
|
||||
To see the raw contents of the string, use `str_view()`[^strings-1]:
|
||||
|
||||
[^strings-1]: You can also use the base R function `writeLines()`
|
||||
[^strings-1]: You can also use the base R function `writeLines()`.
|
||||
|
||||
```{r}
|
||||
x <- c(single_quote, double_quote, backslash)
|
||||
|
@ -183,7 +183,7 @@ If you are mixing many fixed and variable strings with `str_c()`, you'll notice
|
|||
An alternative approach is provided by the [glue package](https://glue.tidyverse.org) via `str_glue()`[^strings-4] .
|
||||
You give it a single string containing `{}` and anything inside `{}` will be evaluated like it's outside of the string:
|
||||
|
||||
[^strings-4]: If you're not using stringr, you can also access it directly with `glue::glue().`
|
||||
[^strings-4]: If you're not using stringr, you can also access it directly with `glue::glue()`.
|
||||
|
||||
```{r}
|
||||
df |> mutate(greeting = str_glue("Hi {name}!"))
|
||||
|
@ -316,7 +316,7 @@ For example, `.`
|
|||
will match any character[^strings-8], so `"a."` will match any string that contains an "a" followed by another character
|
||||
:
|
||||
|
||||
[^strings-7]: You'll learn how to escape this special behaviour in @sec-regexp-escaping
|
||||
[^strings-7]: You'll learn how to escape this special behaviour in @sec-regexp-escaping.
|
||||
|
||||
[^strings-8]: Well, any character apart from `\n`.
|
||||
|
||||
|
|
|
@ -72,7 +72,7 @@ tibble(
|
|||
)
|
||||
```
|
||||
|
||||
Another way to create a tibble is with `tribble()`, which short for **tr**ansposed tibble.
|
||||
Another way to create a tibble is with `tribble()`, which short for **tr**ansposed t**ibble**.
|
||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
|
||||
This makes it possible to lay out small amounts of data in an easy to read form:
|
||||
|
||||
|
|
|
@ -34,7 +34,7 @@ x <- 3 * 4
|
|||
You can **c**ombine multiple elements into a vector with `c()`:
|
||||
|
||||
```{r}
|
||||
primes <- c(1, 2, 3, 5, 7, 11, 13)
|
||||
primes <- c(2, 3, 5, 7, 11, 13)
|
||||
```
|
||||
|
||||
And basic arithmetic is applied to every element of the vector:
|
||||
|
@ -69,7 +69,7 @@ Comments can be helpful for briefly describing what the subsequent code does.
|
|||
|
||||
```{r}
|
||||
# define primes
|
||||
primes <- c(1, 2, 3, 5, 7, 11, 13)
|
||||
primes <- c(2, 3, 5, 7, 11, 13)
|
||||
|
||||
# multiply primes by 2
|
||||
primes * 2
|
||||
|
|
|
@ -20,7 +20,7 @@ If you get an error message and you have no idea what it means, try googling it!
|
|||
Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web.
|
||||
(If the error message isn't in English, run `Sys.setenv(LANGUAGE = "en")` and re-run the code; you're more likely to find help for English error messages.)
|
||||
|
||||
If Google doesn't help, try [Stack Overflow](http://stackoverflow.com).
|
||||
If Google doesn't help, try [Stack Overflow](https://stackoverflow.com).
|
||||
Start by spending a little time searching for an existing answer, including `[R]` to restrict your search to questions and answers that use R.
|
||||
|
||||
## Making a reprex
|
||||
|
@ -30,7 +30,7 @@ A good reprex makes it easier for other people to help you, and often you'll fig
|
|||
There are two parts to creating a reprex:
|
||||
|
||||
- First, you need to make your code reproducible.
|
||||
This means that you need to capture everything, i.e., include any library() calls and create all necessary objects.
|
||||
This means that you need to capture everything, i.e., include any `library()` calls and create all necessary objects.
|
||||
The easiest way to make sure you've done this is to use the reprex package.
|
||||
|
||||
- Second, you need to make it minimal.
|
||||
|
|
|
@ -117,7 +117,7 @@ But they're still good to know about even if you've never used `%>%` because you
|
|||
|
||||
With `%>%` you can use `.` on the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in [Chapter -@sec-vectors]), so you can extract a single column from a data frame with (e.g.) `mtcars %>% .$cyl`.
|
||||
A future version of R may add similar support for `|>` and `_`.
|
||||
For the special case of extracting a column out of a data frame, you can also use `dplyr::pull():`
|
||||
For the special case of extracting a column out of a data frame, you can also use `dplyr::pull()`:
|
||||
|
||||
```{r}
|
||||
mtcars |> pull(cyl)
|
||||
|
|
|
@ -25,7 +25,7 @@ Keep experimenting in the console, but once you have written code that works and
|
|||
#| out-width: ~
|
||||
#| fig-cap: >
|
||||
#| Opening the script editor adds a new pane at the top-left of the
|
||||
#| IDE
|
||||
#| IDE.
|
||||
#| fig-alt: >
|
||||
#| RStudio IDE with Editor, Console, and Output highlighted.
|
||||
knitr::include_graphics("diagrams/rstudio/script.png", dpi = 270)
|
||||
|
@ -176,7 +176,7 @@ You can do this either by running `usethis::use_blank_slate()`[^workflow-scripts
|
|||
But this short-term pain saves you long-term agony because it forces you to capture all important interactions in your code.
|
||||
There's nothing worse than discovering three months after the fact that you've only stored the results of an important calculation in your workspace, not the calculation itself in your code.
|
||||
|
||||
[^workflow-scripts-2]: If you don't have usethis installed, you can install it with `install.packages("usethis")`
|
||||
[^workflow-scripts-2]: If you don't have usethis installed, you can install it with `install.packages("usethis")`.
|
||||
|
||||
```{r}
|
||||
#| label: fig-blank-slate
|
||||
|
|
|
@ -13,7 +13,7 @@ Using a consistent style makes it easier for others (including future-you!) to r
|
|||
This chapter will introduce to the most important points of the [tidyverse style guide](https://style.tidyverse.org), which is used throughout this book.
|
||||
|
||||
Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature.
|
||||
Additionally, there are some great tools to quickly restyle existing code, like the [styler](http://styler.r-lib.org) package by Lorenz Walthert.
|
||||
Additionally, there are some great tools to quickly restyle existing code, like the [styler](https://styler.r-lib.org) package by Lorenz Walthert.
|
||||
Once you've installed it with `install.packages("styler")`, an easy way to use it is via RStudio's **command palette**.
|
||||
The command palette lets you use any build-in RStudio command, as well as many addins provided by packages.
|
||||
Open the palette by pressing Cmd/Ctrl + Shift + P, then type "styler" to see all the shortcuts provided by styler.
|
||||
|
|
Loading…
Reference in New Issue