Minors (#1339)
* comma * Quarto link * but instead of and, as it seems to be considered as a good thing then a bad thing * Reduce repetition * Typo ot ⇒ to * Rm spurious comma * TODO ref * Comment about a strange sentence * Comment not in my env * Comment about create ≠ assign * Argument about reading one’s mind * Broken ref comment * Argument about repetition * Argue for reducing repetition * Comment about dplyr * Resolve to dos * Resolve to dos * Update intro.qmd * Update intro.qmd * Resolve to dos * Fix number of workflow chapters --------- Co-authored-by: Olivier Cailloux <olivier.cailloux@gmail.com> Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
parent
bac06d00f2
commit
39132b9a74
|
@ -72,7 +72,7 @@ But before we discuss their individual differences, it's worth stating what they
|
|||
3. The output is always a new data frame.
|
||||
|
||||
Because each verb does one thing well, solving complex problems will usually require combining multiple verbs, and we'll do so with the pipe, `|>`.
|
||||
We'll discuss the pipe more in @the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`.
|
||||
We'll discuss the pipe more in @sec-the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`.
|
||||
The easiest way to pronounce the pipe is "then".
|
||||
That makes it possible to get a sense of the following code even though you haven't yet learned the details:
|
||||
|
||||
|
@ -320,8 +320,7 @@ Often, the right answer is a new object that is named informatively to indicate
|
|||
|
||||
It's not uncommon to get datasets with hundreds or even thousands of variables.
|
||||
In this situation, the first challenge is often just focusing on the variables you're interested in.
|
||||
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.
|
||||
`select()` is not terribly useful with the `flights` data because we only have 19 variables, but you can still get the general idea of how it works:
|
||||
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables:
|
||||
|
||||
- Select columns by name:
|
||||
|
||||
|
@ -467,7 +466,7 @@ ggplot(flights, aes(x = air_time - airtime2)) + geom_histogram()
|
|||
arrange(arr_delay)
|
||||
```
|
||||
|
||||
## The pipe {#the-pipe}
|
||||
## The pipe {#sec-the-pipe}
|
||||
|
||||
We've shown you simple examples of the pipe above, but its real power arises when you start to combine multiple verbs.
|
||||
For example, imagine that you wanted to find the fast flights to Houston's IAH airport: you need to combine `filter()`, `mutate()`, `select()`, and `arrange()`:
|
||||
|
|
|
@ -13,7 +13,7 @@ This website is and will always be free, licensed under the [CC BY-NC-ND 3.0](ht
|
|||
If you'd like a physical copy of the book, you can order the 1st edition on [Amazon](https://amzn.to/2aHLAQ1), or wait until mid-2023 for the 2nd edition.
|
||||
If appreciate reading the book for free and would like to give back please make a donation to [Kākāpō Recovery](https://www.doc.govt.nz/kakapo-donate): the [kākāpō](https://www.youtube.com/watch?v=9T1vfsHYiKY) (which appears on the cover of R4DS) is a critically endangered native NZ parrot; there are only 252 left.
|
||||
|
||||
If you speak, another language, you might be interested in the freely available translations of the 1st edition:
|
||||
If you speak another language, you might be interested in the freely available translations of the 1st edition:
|
||||
|
||||
- [Spanish](https://es.r4ds.hadley.nz)
|
||||
- [Italian](https://it.r4ds.hadley.nz)
|
||||
|
|
|
@ -52,7 +52,7 @@ These have complementary strengths and weaknesses, so any real data analysis wil
|
|||
**Visualization** is a fundamentally human activity.
|
||||
A good visualization will show you things you did not expect or raise new questions about the data.
|
||||
A good visualization might also hint that you're asking the wrong question or that you need to collect different data.
|
||||
Visualizations can surprise you, and they don't scale particularly well because they require a human to interpret them.
|
||||
Visualizations can surprise you, but they don't scale particularly well because they require a human to interpret them.
|
||||
|
||||
**Models** are complementary tools to visualization.
|
||||
Once you have made your questions sufficiently precise, you can use a model to answer them.
|
||||
|
@ -105,7 +105,7 @@ We'll also show you how to get data out of databases and parquet files, both of
|
|||
You won't necessarily be able to work with the entire dataset, but that's not a problem because you only need a subset or subsample to answer the question that you're interested in.
|
||||
|
||||
If you're routinely working with larger data (10-100 Gb, say), we recommend learning more about [data.table](https://github.com/Rdatatable/data.table).
|
||||
We don't teach it here because it uses a different interface to the tidyverse and requires you ot learn some different conventions.
|
||||
We don't teach it here because it uses a different interface to the tidyverse and requires you to learn some different conventions.
|
||||
However, it is incredible faster and the performance payoff is worth investing some time learning it if you're working with large data.
|
||||
|
||||
### Python, Julia, and friends
|
||||
|
|
|
@ -27,5 +27,5 @@ A brief summary of the biggest changes follows:
|
|||
We never had enough room to fully do modelling justice, and there are now much better resources available.
|
||||
We generally recommend using the [tidymodels](https://www.tidymodels.org/) packages and reading [Tidy Modeling with R](https://www.tmwr.org/) by Max Kuhn and Julia Silge.
|
||||
|
||||
- The communicate part remains, but has been thoroughly updated to feature Quarto instead of R Markdown.
|
||||
- The communicate part remains, but has been thoroughly updated to feature [Quarto](https://quarto.org/) instead of R Markdown.
|
||||
This edition of the book has been written in quarto, and it's clearly the tool of the future.
|
||||
|
|
|
@ -8,7 +8,7 @@ source("_common.R")
|
|||
|
||||
Our goal in this part of the book is to give you a rapid overview of the main tools of data science: **importing**, **tidying**, **transforming**, and **visualizing data**, as shown in @fig-ds-whole-game.
|
||||
We want to show you the "whole game" of data science giving you just enough of all the major pieces so that you can tackle real, if simple, datasets.
|
||||
The later parts of the book, will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.
|
||||
The later parts of the book will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.
|
||||
|
||||
```{r}
|
||||
#| label: fig-ds-whole-game
|
||||
|
@ -39,7 +39,7 @@ Five chapters focus on the tools of data science:
|
|||
- Before you can transform and visualize your data, you need to first get your data into R.
|
||||
In @sec-data-import you'll learn the basics of getting `.csv` files into R.
|
||||
|
||||
Nestled among these chapters are five other chapters that focus on your R workflow.
|
||||
Nestled among these chapters are four other chapters that focus on your R workflow.
|
||||
In @sec-workflow-basics, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
|
||||
These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects.
|
||||
Finally, @sec-workflow-getting-help will teach you how to get help and keep learning.
|
||||
|
|
Loading…
Reference in New Issue