Mild import/wrangling reorg

This commit is contained in:
Hadley Wickham
2022-06-20 10:40:11 -05:00
parent 23bfba6809
commit 8f7748dcb1
12 changed files with 25 additions and 72 deletions

View File

@@ -11,8 +11,7 @@ status("polishing")
Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data.
In this chapter, you'll learn how to read plain-text rectangular files into R.
Here, we'll only scratch the surface of data import, but many of the principles will translate to other forms of data.
We'll finish with a few pointers to packages that are useful for other types of data.
Here, we'll only scratch the surface of data import, but many of the principles will translate to other forms of data, which we'll come back to in @sec-wrangle.
### Prerequisites
@@ -320,33 +319,10 @@ There are two alternatives:
```
Feather tends to be faster than RDS and is usable outside of R.
RDS supports list-columns (which you'll learn about in [Chapter -@sec-list-columns]; feather currently does not.
RDS supports list-columns (which you'll learn about in @sec-rectangling; feather currently does not.
```{r}
#| include: false
file.remove("students-2.csv")
file.remove("students.rds")
```
## Other types of data
To get other types of data into R, we recommend starting with the tidyverse packages listed below.
They're certainly not perfect, but they are a good place to start.
For rectangular data:
- **readxl** reads Excel files (both `.xls` and `.xlsx`).
See [Chapter -@sec-import-spreadsheets] for more on working with data stored in Excel spreadsheets.
- **googlesheets4** reads Google Sheets.
Also see [Chapter -@sec-import-spreadsheets] for more on working with data stored in Google Sheets.
- **DBI**, along with a database specific backend (e.g. **RMySQL**, **RSQLite**, **RPostgreSQL** etc) allows you to run SQL queries against a database and return a data frame.
See [Chapter -@sec-import-databases] for more on working with databases .
- **haven** reads SPSS, Stata, and SAS files.
For hierarchical data: use **jsonlite** (by Jeroen Ooms) for json, and **xml2** for XML.
Jenny Bryan has some excellent worked examples at <https://jennybc.github.io/purrr-tutorial/>.
For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [**rio**](https://github.com/leeper/rio) package.