Improve cross-references
* Fix broken links * Update chapter links
This commit is contained in:
parent
d9a86edcf0
commit
faeeb564a4
2
EDA.qmd
2
EDA.qmd
|
@ -919,7 +919,7 @@ Typically, the first one or two arguments to a function are so important that yo
|
|||
The first two arguments to `ggplot()` are `data` and `mapping`, and the first two arguments to `aes()` are `x` and `y`.
|
||||
In the remainder of the book, we won't supply those names.
|
||||
That saves typing, and, by reducing the amount of boilerplate, makes it easier to see what's different between plots.
|
||||
That's a really important programming concern that we'll come back to in [Chapter -@sec-functions].
|
||||
That's a really important programming concern that we'll come back to in @sec-functions.
|
||||
|
||||
Rewriting the previous plot more concisely yields:
|
||||
|
||||
|
|
|
@ -1,26 +0,0 @@
|
|||
# Column-wise operations {#sec-column-wise}
|
||||
|
||||
```{r}
|
||||
#| results: "asis"
|
||||
#| echo: false
|
||||
source("_common.R")
|
||||
status("drafting")
|
||||
```
|
||||
|
||||
## Introduction
|
||||
|
||||
<!--# TO DO: Write introduction. -->
|
||||
|
||||
### Prerequisites
|
||||
|
||||
In this chapter we'll continue using dplyr.
|
||||
dplyr is a member of the core tidyverse.
|
||||
|
||||
```{r}
|
||||
#| label: setup
|
||||
#| message: false
|
||||
|
||||
library(tidyverse)
|
||||
```
|
||||
|
||||
<!--# TO DO: Write chapter around across, etc. -->
|
|
@ -11,7 +11,6 @@ status("polishing")
|
|||
|
||||
Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data.
|
||||
In this chapter, you'll learn how to read plain-text rectangular files into R.
|
||||
Here, we'll only scratch the surface of data import, but many of the principles will translate to other forms of data, which we'll come back to in @sec-wrangle.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
@ -116,7 +115,7 @@ There are two cases where you might want to tweak this behavior:
|
|||
read_csv("1,2,3\n4,5,6", col_names = FALSE)
|
||||
```
|
||||
|
||||
(`"\n"` is a convenient shortcut for adding a new line. You'll learn more about it and other types of string escape in [Chapter -@sec-strings].)
|
||||
(`"\n"` is a convenient shortcut for adding a new line. You'll learn more about it and other types of string escape in @sec-strings.)
|
||||
|
||||
Alternatively you can pass `col_names` a character vector which will be used as the column names:
|
||||
|
||||
|
@ -171,7 +170,7 @@ Another common task after reading in data is to consider variable types.
|
|||
For example, `meal_type` is a categorical variable with a known set of possible values.
|
||||
In R, factors can be used to work with categorical variables.
|
||||
We can convert this variable to a factor using the `factor()` function.
|
||||
You'll learn more about factors in [Chapter -@sec-factors].
|
||||
You'll learn more about factors in @sec-factors.
|
||||
|
||||
```{r}
|
||||
students <- students |>
|
||||
|
@ -184,7 +183,7 @@ students
|
|||
Note that the values in the `meal_type` variable has stayed exactly the same, but the type of variable denoted underneath the variable name has changed from character (`<chr>`) to factor (`<fct>`).
|
||||
|
||||
Before you move on to analyzing these data, you'll probably want to fix the `age` column as well: currently it's a character variable because of the one observation that is typed out as `five` instead of a numeric `5`.
|
||||
We discuss the details of fixing this issue in [Chapter -@sec-import-spreadsheets] in further detail.
|
||||
We discuss the details of fixing this issue in @sec-import-spreadsheets in further detail.
|
||||
|
||||
### Compared to base R
|
||||
|
||||
|
@ -331,7 +330,7 @@ file.remove("students.rds")
|
|||
|
||||
In this chapter, you've learned how to use readr to load rectangular flat files from disk into R.
|
||||
You've learned how csv files work, some of the problems you might encounter, and how to overcome them.
|
||||
We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-import-rectangling from JSON, and @sec-import-scraping from websites.
|
||||
We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-rectangling from JSON, and @sec-scraping from websites.
|
||||
|
||||
Now that you're writing a substantial amount of R code, it's time to learn more about organizing your code into files and directories.
|
||||
In the next chapter, you'll learn all about the advantages of scripts and projects, and some of the many tools that they provide to make your life easier.
|
||||
|
|
|
@ -202,7 +202,7 @@ Take 2 Pac's "Baby Don't Cry", for example.
|
|||
The above output suggests that it was only the top 100 for 7 weeks, and all the remaining weeks are filled in with missing values.
|
||||
These `NA`s don't really represent unknown observations; they're forced to exist by the structure of the dataset[^data-tidy-1], so we can ask `pivot_longer()` to get rid of them by setting `values_drop_na = TRUE`:
|
||||
|
||||
[^data-tidy-1]: We'll come back to this idea in [Chapter -@sec-missing-values].
|
||||
[^data-tidy-1]: We'll come back to this idea in @sec-missing-values.
|
||||
|
||||
```{r}
|
||||
billboard |>
|
||||
|
@ -218,7 +218,7 @@ You might also wonder what happens if a song is in the top 100 for more than 76
|
|||
We can't tell from this data, but you might guess that additional columns `wk77`, `wk78`, ... would be added to the dataset.
|
||||
|
||||
This data is now tidy, but we could make future computation a bit easier by converting `week` into a number using `mutate()` and `parse_number()`.
|
||||
You'll learn more about `parse_number()` and friends in [Chapter -@sec-data-import].
|
||||
You'll learn more about `parse_number()` and friends in @sec-data-import.
|
||||
|
||||
```{r}
|
||||
billboard_tidy <- billboard |>
|
||||
|
@ -365,7 +365,7 @@ who2 |>
|
|||
)
|
||||
```
|
||||
|
||||
An alternative to `names_sep` is `names_pattern`, which you can use to extract variables from more complicated naming scenarios, once you've learned about regular expressions in [Chapter -@sec-regular-expressions].
|
||||
An alternative to `names_sep` is `names_pattern`, which you can use to extract variables from more complicated naming scenarios, once you've learned about regular expressions in @sec-regular-expressions.
|
||||
|
||||
Conceptually, this is only a minor variation on the simpler case you've already seen.
|
||||
@fig-pivot-multiple-names shows the basic idea: now, instead of the column names pivoting into a single column, they pivot into multiple columns.
|
||||
|
@ -540,7 +540,7 @@ df |>
|
|||
|
||||
It then fills in all the missing values using the data in the input.
|
||||
In this case, not every cell in the output has corresponding value in the input as there's no entry for id "B" and name "z", so that cell remains missing.
|
||||
We'll come back to this idea that `pivot_wider()` can "make" missing values in [Chapter -@sec-missing-values].
|
||||
We'll come back to this idea that `pivot_wider()` can "make" missing values in @sec-missing-values.
|
||||
|
||||
You might also wonder what happens if there are multiple rows in the input that correspond to one cell in the output.
|
||||
The example below has two rows that correspond to id "A" and name "x":
|
||||
|
@ -665,7 +665,7 @@ cluster_id <- cluster$cluster |>
|
|||
cluster_id
|
||||
```
|
||||
|
||||
You could then combine this back with the original data using one of the joins you'll learn about in [Chapter -@sec-relational-data].
|
||||
You could then combine this back with the original data using one of the joins you'll learn about in @sec-joins.
|
||||
|
||||
```{r}
|
||||
gapminder |> left_join(cluster_id)
|
||||
|
|
|
@ -48,7 +48,7 @@ If you've used R before, you might notice that this data frame prints a little d
|
|||
That's because it's a **tibble**, a special type of data frame used by the tidyverse to avoid some common gotchas.
|
||||
The most important difference is the way it prints: tibbles are designed for large datasets, so they only show the first few rows and only the columns that fit on one screen.
|
||||
To see everything, use `View(flights)` to open the dataset in the RStudio viewer.
|
||||
We'll come back to other important differences in [Chapter -@sec-tibbles].
|
||||
We'll come back to other important differences in @sec-tibbles.
|
||||
|
||||
You might have noticed the short abbreviations that follow each column name.
|
||||
These tell you the type of each variable: `<int>` is short for integer, `<dbl>` is short for double (aka real numbers), `<chr>` for character (aka strings), and `<dttm>` for date-time.
|
||||
|
@ -85,7 +85,7 @@ The code starts with the `flights` dataset, then filters it, then groups it, the
|
|||
We'll come back to the pipe and its alternatives in @sec-pipes.
|
||||
|
||||
dplyr's verbs are organised into four groups based on what they operate on: **rows**, **columns**, **groups**, or **tables**.
|
||||
In the following sections you'll learn the most important verbs for rows, columns, and groups, then we'll come back to verb that work on tables in [Chapter -@sec-relational-data].
|
||||
In the following sections you'll learn the most important verbs for rows, columns, and groups, then we'll come back to verb that work on tables in @sec-joins.
|
||||
Let's dive in!
|
||||
|
||||
## Rows
|
||||
|
@ -129,7 +129,7 @@ flights |>
|
|||
filter(month %in% c(1, 2))
|
||||
```
|
||||
|
||||
We'll come back to these comparisons and logical operators in more detail in [Chapter -@sec-logicals].
|
||||
We'll come back to these comparisons and logical operators in more detail in @sec-logicals.
|
||||
|
||||
When you run `filter()` dplyr executes the filtering operation, creating a new data frame, and then prints it.
|
||||
It doesn't modify the existing `flights` dataset because dplyr functions never modify their inputs.
|
||||
|
@ -308,7 +308,7 @@ There are a number of helper functions you can use within `select()`:
|
|||
- `num_range("x", 1:3)`: matches `x1`, `x2` and `x3`.
|
||||
|
||||
See `?select` for more details.
|
||||
Once you know regular expressions (the topic of [Chapter -@sec-regular-expressions]) you'll also be use `matches()` to select variables that match a pattern.
|
||||
Once you know regular expressions (the topic of @sec-regular-expressions) you'll also be use `matches()` to select variables that match a pattern.
|
||||
|
||||
You can rename variables as you `select()` them by using `=`.
|
||||
The new name appears on the left hand side of the `=`, and the old variable appears on the right hand side:
|
||||
|
@ -435,7 +435,7 @@ flights |>
|
|||
|
||||
Uhoh!
|
||||
Something has gone wrong and all of our results are `NA` (pronounced "N-A"), R's symbol for missing value.
|
||||
We'll come back to discuss missing values in [Chapter -@sec-missing-values], but for now we'll remove them by using `na.rm = TRUE`:
|
||||
We'll come back to discuss missing values in @sec-missing-values, but for now we'll remove them by using `na.rm = TRUE`:
|
||||
|
||||
```{r}
|
||||
flights |>
|
||||
|
@ -671,6 +671,6 @@ You can find a good explanation of this problem and how to overcome it at <http:
|
|||
In this chapter, you've learned the tools that dplyr provides for working with data frames.
|
||||
The tools are roughly grouped into three categories: those that manipulate the rows (like `filter()` and `arrange()`, those that manipulate the columns (like `select()` and `mutate()`), and those that manipulate groups (like `group_by()` and `summarise()`).
|
||||
In this chapter, we've focused on these "whole data frame" tools, but you haven't yet learned much about what you can do with the individual variable.
|
||||
We'll come back to that in @sec-transform-intro, where each chapter will give you tools for a specific type of variable.
|
||||
We'll come back to that in the Transform part of the book, where each chapter will give you tools for a specific type of variable.
|
||||
|
||||
For now, we'll pivot back to workflow, and in the next chapter you'll learn more about the pipe, `|>`, why we recommend it, and a little of the history that lead from magrittr's `%>%` to base R's `|>`.
|
||||
|
|
|
@ -32,9 +32,9 @@ The goal of this chapter is to get you started on your journey with functions wi
|
|||
The chapter concludes with some advice on function style.
|
||||
|
||||
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
|
||||
I've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
|
||||
We've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
|
||||
If you want just to see a huge variety of funcitons, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
||||
I won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
|
||||
WI won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
|
|
|
@ -223,7 +223,7 @@ knitr::include_graphics("diagrams/transform.png", dpi = 270)
|
|||
As well as `&` and `|`, R also has `&&` and `||`.
|
||||
Don't use them in dplyr functions!
|
||||
These are called short-circuiting operators and only ever return a single `TRUE` or `FALSE`.
|
||||
They're important for programming and you'll learn more about them in @sec-conditional-execution.
|
||||
They're important for programming, not data science
|
||||
|
||||
### Missing values {#sec-na-boolean}
|
||||
|
||||
|
@ -402,7 +402,7 @@ This works, but what if we wanted to also compute the average delay for flights
|
|||
We'd need to perform a separate filter step, and then figure out how to combine the two data frames together[^logicals-3].
|
||||
Instead you could use `[` to perform an inline filtering: `arr_delay[arr_delay > 0]` will yield only the positive arrival delays.
|
||||
|
||||
[^logicals-3]: We'll cover this in [Chapter -@sec-relational-data]
|
||||
[^logicals-3]: We'll cover this in @sec-joins\]
|
||||
|
||||
This leads to:
|
||||
|
||||
|
|
|
@ -121,9 +121,9 @@ The vast majority of transformation functions are already built into base R.
|
|||
It's impractical to list them all so this section will show the most useful ones.
|
||||
As an example, while R provides all the trigonometric functions that you might dream of, we don't list them here because they're rarely needed for data science.
|
||||
|
||||
### Arithmetic and recycling rules
|
||||
### Arithmetic and recycling rules {#sec-recycling}
|
||||
|
||||
We introduced the basics of arithmetic (`+`, `-`, `*`, `/`, `^`) in [Chapter -@sec-workflow-basics] and have used them a bunch since.
|
||||
We introduced the basics of arithmetic (`+`, `-`, `*`, `/`, `^`) in @sec-workflow-basics and have used them a bunch since.
|
||||
These functions don't need a huge amount of explanation because they do what you learned in grade school.
|
||||
But we need to briefly talk about the **recycling rules** which determine what happens when the left and right hand sides have different lengths.
|
||||
This is important for operations like `flights |> mutate(air_time = air_time / 60)` because there are 336,776 numbers on the left of `/` but only one on the right.
|
||||
|
@ -742,7 +742,7 @@ flights |>
|
|||
### With `mutate()`
|
||||
|
||||
As the names suggest, the summary functions are typically paired with `summarise()`.
|
||||
However, because of the recycling rules we discussed in @sec-scalars-and-recycling-rules they can also be usefully paired with `mutate()`, particularly when you want do some sort of group standardization.
|
||||
However, because of the recycling rules we discussed in @sec-recycling they can also be usefully paired with `mutate()`, particularly when you want do some sort of group standardization.
|
||||
For example:
|
||||
|
||||
- `x / sum(x)` calculates the proportion of a total.
|
||||
|
|
10
regexps.qmd
10
regexps.qmd
|
@ -9,7 +9,7 @@ status("restructuring")
|
|||
|
||||
## Introduction
|
||||
|
||||
You learned the basics of regular expressions in [Chapter -@sec-strings], but regular expressions are fairly rich language so it's worth spending some extra time on the details.
|
||||
You learned the basics of regular expressions in @sec-strings, but regular expressions are fairly rich language so it's worth spending some extra time on the details.
|
||||
|
||||
The chapter starts by expanding your knowledge of patterns, to cover six important new topics (escaping, anchoring, character classes, shorthand classes, quantifiers, and alternation).
|
||||
Here we'll focus mostly on the language itself, not the functions that use it.
|
||||
|
@ -51,7 +51,7 @@ It's not R specific, but it covers the most advanced features and explains how r
|
|||
|
||||
## Pattern language
|
||||
|
||||
You learned the very basics of the regular expression pattern language in [Chapter -@sec-strings], and now its time to dig into more of the details.
|
||||
You learned the very basics of the regular expression pattern language in @sec-strings, and now its time to dig into more of the details.
|
||||
First, we'll start with **escaping**, which allows you to match characters that the pattern language otherwise treats specially.
|
||||
Next you'll learn about **anchors**, which allow you to match the start or end of the string.
|
||||
Then you'll learn about **character classes** and their shortcuts, which allow you to match any character from a set.
|
||||
|
@ -60,7 +60,7 @@ We'll finish up with **quantifiers**, which control how many times a pattern can
|
|||
The terms we use here are the technical names for each component.
|
||||
They're not always the most evocative of their purpose, but it's very helpful to know the correct terms if you later want to Google for more details.
|
||||
|
||||
We'll concentrate on showing how these patterns work with `str_view()` and `str_view_all()` but remember that you can use them with any of the functions that you learned about in [Chapter -@sec-strings], i.e.:
|
||||
We'll concentrate on showing how these patterns work with `str_view()` and `str_view_all()` but remember that you can use them with any of the functions that you learned about in @sec-strings, i.e.:
|
||||
|
||||
- `str_detect(x, pattern)` returns a logical vector the same length as `x`, indicating whether each element matches (`TRUE`) or doesn't match (`FALSE`) the pattern.
|
||||
- `str_count(x, pattern)` returns the number of times `pattern` matches in each element of `x`.
|
||||
|
@ -68,7 +68,7 @@ We'll concentrate on showing how these patterns work with `str_view()` and `str_
|
|||
|
||||
### Escaping {#sec-regexp-escaping}
|
||||
|
||||
In [Chapter -@sec-strings], you'll learned how to match a literal `.` by using `fixed(".")`.
|
||||
In @sec-strings, you'll learned how to match a literal `.` by using `fixed(".")`.
|
||||
But what if you want to match a literal `.` as part of a bigger regular expression?
|
||||
You'll need to use an **escape**, which tells the regular expression you want it to match exactly, not use its special behavior.
|
||||
Like strings, regexps use the backslash for escaping, so to match a `.`, you need the regexp `\.`.
|
||||
|
@ -201,7 +201,7 @@ str_view_all("abcd12345!@#%. ", "\\S+")
|
|||
### Quantifiers
|
||||
|
||||
The **quantifiers** control how many times a pattern matches.
|
||||
In [Chapter -@sec-strings] you learned about `?` (0 or 1 matches), `+` (1 or more matches), and `*` (0 or more matches).
|
||||
In @sec-strings you learned about `?` (0 or 1 matches), `+` (1 or more matches), and `*` (0 or more matches).
|
||||
For example, `colou?r` will match American or British spelling, `\d+` will match one or more digits, and `\s?` will optionally match a single whitespace.
|
||||
|
||||
You can also specify the number of matches precisely:
|
||||
|
|
|
@ -12,7 +12,7 @@ status("drafting")
|
|||
So far you have learned about importing data from plain text files, e.g. `.csv` and `.tsv` files.
|
||||
Sometimes you need to analyze data that lives in a spreadsheet.
|
||||
In this chapter we will introduce you to tools for working with data in Excel spreadsheets and Google Sheets.
|
||||
This will build on much of what you've learned in [Chapter -@sec-data-import] and [Chapter -@sec-import-rectangular], but we will also discuss additional considerations and complexities when working with data from spreadsheets.
|
||||
This will build on much of what you've learned in @sec-data-import and @sec-import-rectangular, but we will also discuss additional considerations and complexities when working with data from spreadsheets.
|
||||
|
||||
If you or your collaborators are using spreadsheets for organizing data, we strongly recommend reading the paper "Data Organization in Spreadsheets" by Karl Broman and Kara Woo: <https://doi.org/10.1080/00031305.2017.1375989>.
|
||||
The best practices presented in this paper will save you much headache down the line when you import the data from a spreadsheet into R to analyse and visualise.
|
||||
|
@ -222,7 +222,7 @@ penguins <- bind_rows(penguins_torgersen, penguins_biscoe, penguins_dream)
|
|||
penguins
|
||||
```
|
||||
|
||||
In [Chapter -@sec-iteration] we'll talk about ways of doing this sort of task without repetitive code <!--# Check to make sure that's the right place to present it -->.
|
||||
In @sec-iteration we'll talk about ways of doing this sort of task without repetitive code.
|
||||
|
||||
### Reading part of a sheet
|
||||
|
||||
|
|
|
@ -17,10 +17,6 @@ You'll then dive into creating strings from data.
|
|||
Next, we'll discuss the basics of regular expressions, a powerful tool for describing patterns in strings, then use those tools to extract data from strings.
|
||||
The chapter finishes up with functions that work with individual letters, including a brief discussion of where your expectations from English might steer you wrong when working with other languages, and a few useful non-stringr functions.
|
||||
|
||||
This chapter is paired with two other chapters.
|
||||
Regular expression are a big topic, so we'll come back to them again in @sec-regular-expressions.
|
||||
We'll also come back to strings again in @sec-programming-with-strings where we'll look at them from a programming perspective rather than a data analysis perspective.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
In this chapter, we'll use functions from the stringr package which is part of the core tidyverse.
|
||||
|
@ -457,7 +453,6 @@ str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
|
|||
```
|
||||
|
||||
Alternatively, you can provide a replacement function: it's called with a vector of matches, and should return what to replacement them with.
|
||||
We'll come back to this powerful tool in [Chapter -@sec-programming-with-strings].
|
||||
|
||||
```{r}
|
||||
x <- c("1 house", "1 person has 2 cars", "3 people")
|
||||
|
|
|
@ -185,7 +185,7 @@ tb |> pull(x1) # by name
|
|||
tb |> pull(1) # by position
|
||||
```
|
||||
|
||||
`pull()` also takes an optional `name` argument that specifies the column to be used as names for a named vector, which you'll learn about in [Chapter -@sec-vectors].
|
||||
`pull()` also takes an optional `name` argument that specifies the column to be used as names for a named vector, which you'll learn about in @sec-vectors.
|
||||
|
||||
```{r}
|
||||
tb |> pull(x1, name = id)
|
||||
|
|
|
@ -15,26 +15,24 @@ Now we'll focus on new skills for specific types of data you will frequently enc
|
|||
|
||||
This part of the book proceeds as follows:
|
||||
|
||||
- In [Chapter -@sec-tibbles], you'll learn about the variant of the data frame that we use in this book: the **tibble**.
|
||||
- In @sec-tibbles, you'll learn about the variant of the data frame that we use in this book: the **tibble**.
|
||||
You'll learn what makes them different from regular data frames, and how you can construct them "by hand".
|
||||
|
||||
- [Chapter -@sec-relational-data] will give you tools for working with multiple interrelated datasets.
|
||||
- @sec-joins will give you tools for working with multiple interrelated datasets.
|
||||
|
||||
- [Chapter -@sec-numbers] ...
|
||||
- @sec-numbers ...
|
||||
|
||||
- [Chapter -@sec-logicals] ...
|
||||
- @sec-logicals ...
|
||||
|
||||
- [Chapter -@sec-missing-values]...
|
||||
- @sec-missing-values...
|
||||
|
||||
- [Chapter -@sec-strings] will give you tools for working with strings and introduce regular expressions, a powerful tool for manipulating strings.
|
||||
- @sec-strings will give you tools for working with strings and introduce regular expressions, a powerful tool for manipulating strings.
|
||||
|
||||
- [Chapter -@sec-regular-expressions] ...
|
||||
- @sec-regular-expressions ...
|
||||
|
||||
- [Chapter -@sec-factors] will introduce factors -- how R stores categorical data.
|
||||
- @sec-factors will introduce factors -- how R stores categorical data.
|
||||
They are used when a variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.
|
||||
|
||||
- [Chapter -@sec-dates-and-times] will give you the key tools for working with dates and date-times.
|
||||
|
||||
- [Chapter -@sec-column-wise] will give you tools for performing the same operation on multiple columns.
|
||||
- @sec-dates-and-times will give you the key tools for working with dates and date-times.
|
||||
|
||||
<!-- TO DO: Add chapter descriptions -->
|
||||
|
|
|
@ -393,7 +393,8 @@ knitr::include_graphics("diagrams/lists-subsetting.png")
|
|||
```
|
||||
|
||||
The difference between `[` and `[[` is very important, but it's easy to get confused.
|
||||
To help you remember, let me show you an unusual pepper shaker in @fig-pepper-1.If this pepper shaker is your list `pepper`, then, `pepper[1]` is a pepper shaker containing a single pepper packet, as in @fig-pepper-2.
|
||||
To help you remember, let me show you an unusual pepper shaker in @fig-pepper-1.
|
||||
If this pepper shaker is your list `pepper`, then, `pepper[1]` is a pepper shaker containing a single pepper packet, as in @fig-pepper-2.
|
||||
`pepper[2]` would look the same, but would contain the second packet.
|
||||
`pepper[1:2]` would be a pepper shaker containing two pepper packets.
|
||||
`pepper[[1]]` would extract the pepper packet itself, as in @fig-pepper-3.
|
||||
|
@ -402,7 +403,8 @@ To help you remember, let me show you an unusual pepper shaker in @fig-pepper-1.
|
|||
#| label: fig-pepper-1
|
||||
#| echo: false
|
||||
#| out-width: "25%"
|
||||
#| fig-cap: A pepper shaker that Hadley once found in his hotel room.
|
||||
#| fig-cap: >
|
||||
#| A pepper shaker that Hadley once found in his hotel room.
|
||||
#| fig-alt: >
|
||||
#| A photo of a glass pepper shaker. Instead of the pepper shaker
|
||||
#| containing pepper, it contains many packets of pepper.
|
||||
|
@ -608,3 +610,4 @@ The class of tibble includes "data.frame" which means tibbles inherit the regula
|
|||
|
||||
2. Try and make a tibble that has columns with different lengths.
|
||||
What happens?
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Web scraping {#sec-import-webscrape}
|
||||
# Web scraping {#sec-scraping}
|
||||
|
||||
```{r}
|
||||
#| results: "asis"
|
||||
|
|
|
@ -105,7 +105,7 @@ some.people.use.periods
|
|||
And_aFew.People_RENOUNCEconvention
|
||||
```
|
||||
|
||||
We'll come back to names again when we talk more about code style in [Chapter -@sec-workflow-style].
|
||||
We'll come back to names again when we talk more about code style in @sec-workflow-style.
|
||||
|
||||
You can inspect an object by typing its name:
|
||||
|
||||
|
|
|
@ -115,7 +115,7 @@ But they're still good to know about even if you've never used `%>%` because you
|
|||
- The `|>` placeholder is deliberately simple and can't replicate many features of the `%>%` placeholder: you can't pass it to multiple arguments, and it doesn't have any special behavior when the placeholder is used inside another function.
|
||||
For example, `df %>% split(.$var)` is equivalent to `split(df, df$var)` and `df %>% {split(.$x, .$y)}` is equivalent to `split(df$x, df$y)`.
|
||||
|
||||
With `%>%` you can use `.` on the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in [Chapter -@sec-vectors]), so you can extract a single column from a data frame with (e.g.) `mtcars %>% .$cyl`.
|
||||
With `%>%` you can use `.` on the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in @sec-vectors), so you can extract a single column from a data frame with (e.g.) `mtcars %>% .$cyl`.
|
||||
A future version of R may add similar support for `|>` and `_`.
|
||||
For the special case of extracting a column out of a data frame, you can also use `dplyr::pull()`:
|
||||
|
||||
|
|
|
@ -34,7 +34,7 @@ This part of the book proceeds as follows:
|
|||
|
||||
- In @sec-rectangling, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
|
||||
|
||||
- In @sec-import-webscrape, you'll learn about harvesting data off the web and getting it into R.
|
||||
- In @sec-scraping, you'll learn about harvesting data off the web and getting it into R.
|
||||
|
||||
Some other types of data are not covered in this book:
|
||||
|
||||
|
|
Loading…
Reference in New Issue