parent
8828f80842
commit
c3a2688b56
4
EDA.Rmd
4
EDA.Rmd
|
@ -163,7 +163,7 @@ Clusters of similar values suggest that subgroups exist in your data. To underst
|
|||
|
||||
* Why might the appearance of clusters be misleading?
|
||||
|
||||
The histogram shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone National Park. Eruption times appear to be clustered in to two groups: there are short eruptions (of around 2 minutes) and long eruption (4-5 minutes), but little in between.
|
||||
The histogram shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone National Park. Eruption times appear to be clustered into two groups: there are short eruptions (of around 2 minutes) and long eruptions (4-5 minutes), but little in between.
|
||||
|
||||
```{r}
|
||||
ggplot(data = faithful, mapping = aes(x = eruptions)) +
|
||||
|
@ -174,7 +174,7 @@ Many of the questions above will prompt you to explore a relationship *between*
|
|||
|
||||
### Unusual values
|
||||
|
||||
Outliers are observations that are unusual; data points that are don't seem to fit the pattern. Sometimes outliers are data entry errors; other times outliers suggest important new science. When you have a lot of data, outliers are sometimes difficult to see in a histogram. For example, take the distribution of the `x` variable from the diamonds dataset. The only evidence of outliers is the unusually wide limits on the x-axis.
|
||||
Outliers are observations that are unusual; data points that don't seem to fit the pattern. Sometimes outliers are data entry errors; other times outliers suggest important new science. When you have a lot of data, outliers are sometimes difficult to see in a histogram. For example, take the distribution of the `x` variable from the diamonds dataset. The only evidence of outliers is the unusually wide limits on the x-axis.
|
||||
|
||||
```{r}
|
||||
ggplot(diamonds) +
|
||||
|
|
Loading…
Reference in New Issue