TR edits - Chp 1-9 (#1312)

* Mention parquet and databases

* Simplify language

* Explain what var and obs mean

* Data View() alternative

* Explain density

* Boxplot definition

* Clarify IQR, hide figure, add exercise

* will -> can

* Transform edits

* Fix typo

* Clairfy cases
This commit is contained in:
Mine Cetinkaya-Rundel
2023-02-27 21:54:34 -05:00
committed by GitHub
parent c0f0375d44
commit 9887705f43
5 changed files with 67 additions and 48 deletions

View File

@@ -97,10 +97,13 @@ This book will teach you the tidymodels family of packages, which, as you might
### Big data
This book proudly focuses on small, in-memory datasets.
This book proudly and primarily focuses on small, in-memory datasets.
This is the right place to start because you can't tackle big data unless you have experience with small data.
The tools you learn in this book will easily handle hundreds of megabytes of data, and with a bit of care, you can typically use them to work with 1-2 Gb of data.
If you're routinely working with larger data (10-100 Gb, say), you should learn more about [data.table](https://github.com/Rdatatable/data.table).
The tools you learn in majority of this book will easily handle hundreds of megabytes of data, and with a bit of care, you can typically use them to work with 1-2 Gb of data.
That being said, the book also touches on getting data out of databases and out of parquet files, both of which are commonly used solutions for storing big data.
However, if you're routinely working with larger data (10-100 Gb, say), you should learn more about [data.table](https://github.com/Rdatatable/data.table).
This book doesn't teach data.table because it has a very concise interface that offers fewer linguistic cues, which makes it harder to learn.
However, the performance payoff is well worth the effort required to learn it if you're working with large data.
@@ -131,7 +134,7 @@ You should strive to learn new things throughout your career, but make sure your
We think R is a great place to start your data science journey because it is an environment designed from the ground up to support data science.
R is not just a programming language; it is also an interactive environment for doing data science.
To support interaction, R is a much more flexible language than many of its peers.
This flexibility has its downsides, but the big upside is how easy it is to evolve tailored grammars for specific parts of the data science process.
This flexibility has its downsides, but the big upside is how easy it is to have code that is structured like the problem you are trying to solve for specific parts of the data science process.
These mini languages help you think about problems as a data scientist while supporting fluent interaction between your brain and the computer.
## Prerequisites