57 lines
3.2 KiB
Plaintext
57 lines
3.2 KiB
Plaintext
# Transform {#sec-transform-intro .unnumbered}
|
|
|
|
```{r}
|
|
#| results: "asis"
|
|
#| echo: false
|
|
source("_common.R")
|
|
```
|
|
|
|
After writing the first part of the tool, you understand (at least superficially) the most important tools for doing data science.
|
|
Now it's time to start diving into the details.
|
|
In this part of the book, you'll learn about important data types, and the tools you can use to work with them.
|
|
This is important because what you can do to a column depends on what type of column it is.
|
|
|
|
```{r}
|
|
#| label: fig-ds-transform
|
|
#| echo: false
|
|
#| fig-cap: >
|
|
#| The options for data transformation depends heavily on the type of
|
|
#| data involve, the subject of this part of the book.
|
|
#| fig-alt: >
|
|
#| Our data science model transform, highlighted in blue.
|
|
#| out.width: NULL
|
|
|
|
knitr::include_graphics("diagrams/data-science/transform.png", dpi = 270)
|
|
```
|
|
|
|
This part of the book proceeds as follows:
|
|
|
|
- In @sec-tibbles, you'll learn about **tibble**, the variant of the data frame that we use in this book.
|
|
You'll learn what makes tibbles different from regular data frames, and how you can construct them "by hand".
|
|
|
|
- @sec-logicals teaches you about logical vectors.
|
|
These are simplest type of vector in R, but are extremely powerful.
|
|
You'll learn how to create them with numeric comparisons, how to combine them with Boolean algebra, how to use them in summaries, and how to use them for condition transformations.
|
|
|
|
- @sec-numbers dives into tools for vectors of numbers, the powerhouse of data science.
|
|
You'll learn new counting techniques, important transformations and important summary functions.
|
|
|
|
- @sec-strings will give you tools for working with strings: you'll slice them, you'll dice, and you'll stick them back together again.
|
|
This chapter mostly focusses on the stringr package, but you'll also learn some more tidyr functions devoted to extracting data from strings.
|
|
|
|
- @sec-regular-expressions goes into the details of regular expressions, a powerful tool for manipulating strings.
|
|
This chapter will take you from thinking "a cat just walked over my keyboard" to reading and writing complex string patterns.
|
|
|
|
- @sec-factors will introduce factors -- the data type that R uses to store categorical data.
|
|
They are used when a variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.
|
|
|
|
- @sec-dates-and-times will give you the key tools for working with dates and date-times.
|
|
Unfortunately, the more you learn about date-times, the more complicated they seem to get, but with the help of the lubridate package, you'll learn to how to overcome the most common challenges.
|
|
|
|
- We've discussed missing values are couple of times in isolation, but @sec-missing-values will go into detail, helping you come to grips with the different between implicit and explicit missing values, and how and why you might convert between them.
|
|
|
|
- @sec-joins finishes up this part of the book, by giving you tools to join two (or more) data frames together.
|
|
Learning about joins will force you to grapple with the idea of keys, and think about how you identify each row in a dataset.
|
|
|
|
You can read these chapters as you need them; they're designed to be largely standalone so that they can be read out of order.
|