Few typos in EDA, transform, and visualize (#207)
* minor typos in last exercise in common problems * unusually easiest -> unusually easy * move mapping(aes()) into geom to match other ggplot calls * wording suggestion in exercise + typo fix * feel be free -> feel free * and new dataset -> and a new dataset, missing a * dplyr overwrite's'
This commit is contained in:
parent
8357e455f9
commit
1187b85e01
2
EDA.Rmd
2
EDA.Rmd
|
@ -10,7 +10,7 @@ This chapter will show you how to use visualisation and transformation to explor
|
||||||
|
|
||||||
1. Use what you learn to refine your questions and or generate new questions.
|
1. Use what you learn to refine your questions and or generate new questions.
|
||||||
|
|
||||||
EDA is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA you should feel be free to investigate every idea that occurs to you. Some of these ideas will pan out, and some will be dead ends. As your exploration continues, you will hone in on a few particularly productive areas that you'll eventually write up and communicate to others.
|
EDA is not a formal process with a strict set of rules. More than anything, EDA is a state of mind. During the initial phases of EDA you should feel free to investigate every idea that occurs to you. Some of these ideas will pan out, and some will be dead ends. As your exploration continues, you will hone in on a few particularly productive areas that you'll eventually write up and communicate to others.
|
||||||
|
|
||||||
EDA is an important part of any data analysis, even if the questions are handed to you on a platter, because you always need to investigate the quality of your data. Data cleaning is just one application of EDA: you ask questions about whether your data meets your expectations or not. To do data cleaning, you'll need to deploy all the tools of EDA: visualisation, transformation, and modelling.
|
EDA is an important part of any data analysis, even if the questions are handed to you on a platter, because you always need to investigate the quality of your data. Data cleaning is just one application of EDA: you ask questions about whether your data meets your expectations or not. To do data cleaning, you'll need to deploy all the tools of EDA: visualisation, transformation, and modelling.
|
||||||
|
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Visualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often you'll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. You'll learn how to do all that (and more!) in this chapter which will teach you how to transform your data using the dplyr package and new dataset on flights departing New York City in 2013.
|
Visualisation is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often you'll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. You'll learn how to do all that (and more!) in this chapter which will teach you how to transform your data using the dplyr package and a new dataset on flights departing New York City in 2013.
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
|
@ -14,7 +14,7 @@ library(nycflights13)
|
||||||
library(ggplot2)
|
library(ggplot2)
|
||||||
```
|
```
|
||||||
|
|
||||||
Take careful note of the message that's printed when you load dplyr - it tells you that dplyr overwrite some functions in base R. If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()`, `base::intersect()`, etc.
|
Take careful note of the message that's printed when you load dplyr - it tells you that dplyr overwrites some functions in base R. If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()`, `base::intersect()`, etc.
|
||||||
|
|
||||||
### nycflights13
|
### nycflights13
|
||||||
|
|
||||||
|
|
|
@ -211,7 +211,7 @@ ggplot(shapes, aes(x, y)) +
|
||||||
1. What happens if you set an aesthetic to something other than a variable
|
1. What happens if you set an aesthetic to something other than a variable
|
||||||
name, like `displ < 5`?
|
name, like `displ < 5`?
|
||||||
|
|
||||||
1. Vignettes are long-form guides the documentation things about
|
1. Vignettes are long-form guides that document things about
|
||||||
a package that affect many functions. ggplot2 has two vignettes.
|
a package that affect many functions. ggplot2 has two vignettes.
|
||||||
How can you find them and what do they describe? (Hint: Google is
|
How can you find them and what do they describe? (Hint: Google is
|
||||||
your friend.)
|
your friend.)
|
||||||
|
@ -220,7 +220,7 @@ ggplot(shapes, aes(x, y)) +
|
||||||
|
|
||||||
As you start to run R code, you're likely to run into problems. Don't worry --- it happens to everyone. I have been writing R code for years, and every day I still write code that doesn't work!
|
As you start to run R code, you're likely to run into problems. Don't worry --- it happens to everyone. I have been writing R code for years, and every day I still write code that doesn't work!
|
||||||
|
|
||||||
Start by carefully comparing the code that you're running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. Make sure that every `(` is matched with a `)` and every `"` is paired with another `"`. Sometimes you'll run the code and nothing happens. Check the left-hand of your console: if it's a `+`, it means that R doesn't think you've typed a complete expression and it's waiting for you to finish it. In this case, it's usually easiest to start from scratch again by pressing `Escape` to abort processing the current command.
|
Start by carefully comparing the code that you're running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. Make sure that every `(` is matched with a `)` and every `"` is paired with another `"`. Sometimes you'll run the code and nothing happens. Check the left-hand of your console: if it's a `+`, it means that R doesn't think you've typed a complete expression and it's waiting for you to finish it. In this case, it's usually easy to start from scratch again by pressing `Escape` to abort processing the current command.
|
||||||
|
|
||||||
One common problem when creating ggplot2 graphics is to put the `+` in the wrong place: it has to come at the end of the line, not the start. In other words, make sure you haven't accidentally written code like this:
|
One common problem when creating ggplot2 graphics is to put the `+` in the wrong place: it has to come at the end of the line, not the start. In other words, make sure you haven't accidentally written code like this:
|
||||||
|
|
||||||
|
@ -248,8 +248,8 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||||
To facet your plot on the combination of two variables, add `facet_grid()` to your plot call. The first argument of `facet_grid()` is also a formula. This time the formula should contain two variable names separated by a `~`.
|
To facet your plot on the combination of two variables, add `facet_grid()` to your plot call. The first argument of `facet_grid()` is also a formula. This time the formula should contain two variable names separated by a `~`.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
ggplot(data = mpg) +
|
||||||
geom_point() +
|
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||||
facet_grid(drv ~ cyl)
|
facet_grid(drv ~ cyl)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -410,7 +410,7 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||||
A histogram? An area chart?
|
A histogram? An area chart?
|
||||||
|
|
||||||
1. Run this code in your head and predict what the output will look like.
|
1. Run this code in your head and predict what the output will look like.
|
||||||
Run the code in R and check your predictions.
|
Then, run the code in R and check your predictions.
|
||||||
|
|
||||||
```{r, eval = FALSE}
|
```{r, eval = FALSE}
|
||||||
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
|
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
|
||||||
|
@ -496,7 +496,7 @@ Stats are the most subtle part of plotting because you can't see them directly.
|
||||||
|
|
||||||
1. You might want to override the default stat. In the code below, I change
|
1. You might want to override the default stat. In the code below, I change
|
||||||
the stat of `geom_bar()` from count (the default) to identity. This lets
|
the stat of `geom_bar()` from count (the default) to identity. This lets
|
||||||
me map to the height of the bars to the raw values of a $y$ variable.
|
me map the height of the bars to the raw values of a $y$ variable.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
demo <- tibble::tibble(
|
demo <- tibble::tibble(
|
||||||
|
|
Loading…
Reference in New Issue