Eliminate two plots in EDA.qmd
Noticed these in passing. cc @mine-cetinkaya-rundel.
This commit is contained in:
parent
03f1c6c6f4
commit
504db47630
37
EDA.qmd
37
EDA.qmd
|
@ -637,20 +637,6 @@ ggplot(smaller, aes(x = carat, y = price)) +
|
|||
By default, boxplots look roughly the same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that each boxplot summaries a different number of points.
|
||||
One way to show that is to make the width of the boxplot proportional to the number of points with `varwidth = TRUE`.
|
||||
|
||||
Another approach is to display approximately the same number of points in each bin.
|
||||
That's the job of `cut_number()`:
|
||||
|
||||
```{r}
|
||||
#| fig-alt: >
|
||||
#| Side-by-side box plots of price by carat. Each box plot represents 20
|
||||
#| diamonds. The box plots show that as carat increases the median price
|
||||
#| increases as well. Cheaper, smaller diamonds have outliers on the higher
|
||||
#| end, more expensive, bigger diamonds have outliers on the lower end.
|
||||
|
||||
ggplot(smaller, aes(x = carat, y = price)) +
|
||||
geom_boxplot(aes(group = cut_number(carat, 20)))
|
||||
```
|
||||
|
||||
#### Exercises
|
||||
|
||||
1. Instead of summarizing the conditional distribution with a boxplot, you could use a frequency polygon.
|
||||
|
@ -665,21 +651,26 @@ ggplot(smaller, aes(x = carat, y = price)) +
|
|||
4. Combine two of the techniques you've learned to visualize the combined distribution of cut, carat, and price.
|
||||
|
||||
5. Two dimensional plots reveal outliers that are not visible in one dimensional plots.
|
||||
For example, some points in the plot below have an unusual combination of `x` and `y` values, which makes the points outliers even though their `x` and `y` values appear normal when examined separately.
|
||||
For example, some points in the following plot have an unusual combination of `x` and `y` values, which makes the points outliers even though their `x` and `y` values appear normal when examined separately.
|
||||
Why is a scatterplot a better display than a binned plot for this case?
|
||||
|
||||
```{r}
|
||||
#| dev: "png"
|
||||
#| fig-alt: >
|
||||
#| A scatterplot of widths vs. lengths of diamonds. There is a positive,
|
||||
#| strong, linear relationship. There are a few unusual observations
|
||||
#| above and below the bulk of the data, more below it than above.
|
||||
|
||||
ggplot(diamonds, aes(x = x, y = y)) +
|
||||
#| eval: false
|
||||
diamonds |>
|
||||
filter(x >= 4) |>
|
||||
ggplot(aes(x = x, y = y)) +
|
||||
geom_point() +
|
||||
coord_cartesian(xlim = c(4, 11), ylim = c(4, 11))
|
||||
```
|
||||
|
||||
Why is a scatterplot a better display than a binned plot for this case?
|
||||
6. Instead of creating boxes of equal width with `cut_width()`, we could create boxes that contain roughly equal number of points with `cut_number()`.
|
||||
What are the advantages and disadvantages of this approach?
|
||||
|
||||
```{r}
|
||||
#| eval: false
|
||||
ggplot(smaller, aes(x = carat, y = price)) +
|
||||
geom_boxplot(aes(group = cut_number(carat, 20)))
|
||||
```
|
||||
|
||||
## Patterns and models
|
||||
|
||||
|
|
Loading…
Reference in New Issue