35 lines
1.8 KiB
Plaintext
35 lines
1.8 KiB
Plaintext
# (PART) Wrangle {.unnumbered}
|
|
|
|
# Introduction {#wrangle-intro}
|
|
|
|
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for visualisation and modelling.
|
|
Data wrangling is very important: without it you can't work with your own data!
|
|
There are three main parts to data wrangling:
|
|
|
|
```{r echo = FALSE, out.width = "75%"}
|
|
knitr::include_graphics("diagrams/data-science-wrangle.png")
|
|
```
|
|
|
|
This part of the book proceeds as follows:
|
|
|
|
- In [tibbles], you'll learn about the variant of the data frame that we use in this book: the **tibble**.
|
|
You'll learn what makes them different from regular data frames, and how you can construct them "by hand".
|
|
|
|
- In [data import], you'll learn how to get your data from disk and into R.
|
|
We'll focus on plain-text rectangular formats, but will give you pointers to packages that help with other types of data.
|
|
|
|
- In [tidy data], you'll learn about tidy data, a consistent way of storing your data that makes transformation, visualisation, and modelling easier.
|
|
You'll learn the underlying principles, and how to get your data into a tidy form.
|
|
|
|
Data wrangling also encompasses data transformation, which you've already learned a little about.
|
|
Now we'll focus on new skills for three specific types of data you will frequently encounter in practice:
|
|
|
|
- [Relational data] will give you tools for working with multiple interrelated datasets.
|
|
|
|
- [Strings] will introduce regular expressions, a powerful tool for manipulating strings.
|
|
|
|
- [Factors] are how R stores categorical data.
|
|
They are used when a variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.
|
|
|
|
- [Dates and times] will give you the key tools for working with dates and date-times.
|