r4ds/wrangle.qmd

45 lines
2.0 KiB
Plaintext

# Wrangle {#sec-wrangle .unnumbered}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for further work.
In some cases, this is a relatively simple application of a package that does data import.
But in more complex cases it encompasses both tidying and transformation as the native structure of the data might be quite far from the tidy rectangle you'd prefer to work with.
```{r}
#| label: fig-ds-wrangle
#| echo: false
#| fig-cap: >
#| Data wrangling is the combination of importing, tidying, and
#| transforming.
#| fig-alt: >
#| Our data science model with import, tidy, and transform, highlighted
#| in blue and labelled with "wrangle".
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/wrangle.png", dpi = 270)
```
This part of the book proceeds as follows:
- In @sec-rectangling, you'll learn how to get plain-text data in rectangular formats from disk and into R.
- In @sec-import-spreadsheets, you'll learn how to get data from Excel spreadsheets and Google Sheets into R.
- In @sec-import-databases, you'll learn about getting data into R from databases.
- In @sec-arrow, you'll learn about Arrow, a powerful tool for working with large on-disk files.
- In @sec-rectangling, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
- In @sec-scraping, you'll learn about harvesting data off the web and getting it into R.
There are two important tidyverse packages that we don't discuss here: haven and xml2.
If you working with data from SPSS, Stata, and SAS files, check out the **haven** package, <https://haven.tidyverse.org>.
If you're working with XML, check out the **xml2** package, <https://xml2.r-lib.org>.
Otherwise, you'll need to do some research to figure which package you'll need to use; google is your friend here 😃.