Fix typos (#989)

This commit is contained in:
Jakob Krigovsky
2022-01-05 03:07:35 +01:00
committed by GitHub
parent 7bc19dc36a
commit 011f8cceee
7 changed files with 17 additions and 17 deletions

View File

@@ -6,7 +6,7 @@ The goal of a model is to provide a simple low-dimensional summary of a dataset.
In the context of this book we're going to use models to partition data into patterns and residuals.
Strong patterns will hide subtler trends, so we'll use models to help peel back layers of structure as we explore a dataset.
However, before we can start using models on interesting, real, datasets, you need to understand the basics of how models work.
However, before we can start using models on interesting, real datasets, you need to understand the basics of how models work.
For that reason, this chapter of the book is unique because it uses only simulated datasets.
These datasets are very simple, and not at all interesting, but they will help you understand the essence of modelling before you apply the same techniques to real data in the next chapter.
@@ -116,7 +116,7 @@ model1(c(7, 1.5), sim1)
Next, we need some way to compute an overall distance between the predicted and actual values.
In other words, the plot above shows 30 distances: how do we collapse that into a single number?
One common way to do this in statistics to use the "root-mean-squared deviation".
One common way to do this in statistics is to use the "root-mean-squared deviation".
We compute the difference between actual and predicted, square them, average them, and then take the square root.
This distance has lots of appealing mathematical properties, which we're not going to talk about here.
You'll just have to take my word for it!
@@ -316,7 +316,7 @@ ggplot(sim1, aes(x)) +
### Residuals
The flip-side of predictions are **residuals**.
The predictions tells you the pattern that the model has captured, and the residuals tell you what the model has missed.
The predictions tell you the pattern that the model has captured, and the residuals tell you what the model has missed.
The residuals are just the distances between the observed and predicted values that we computed above.
We add residuals to the data with `add_residuals()`, which works much like `add_predictions()`.

View File

@@ -481,7 +481,7 @@ df <- enframe(x)
df
```
The advantage of this structure is that it generalises in a straightforward way - names are useful if you have character vector of metadata, but don't help if you have other types of data, or multiple vectors.
The advantage of this structure is that it generalises in a straightforward way - names are useful if you have a character vector of metadata but don't help if you have other types of data, or multiple vectors.
Now if you want to iterate over names and values in parallel, you can use `map2()`:
@@ -510,7 +510,7 @@ df %>%
```
4. What does this code do?
Why might might it be useful?
Why might it be useful?
```{r, eval = FALSE}
mtcars %>%

View File

@@ -49,10 +49,10 @@ There is a pair of ideas that you must understand in order to do inference corre
As soon as you use an observation twice, you've switched from confirmation to exploration.
This is necessary because to confirm a hypothesis you must use data independent of the data that you used to generate the hypothesis.
Otherwise you will be over optimistic.
Otherwise you will be over-optimistic.
There is absolutely nothing wrong with exploration, but you should never sell an exploratory analysis as a confirmatory analysis because it is fundamentally misleading.
If you are serious about doing an confirmatory analysis, one approach is to split your data into three pieces before you begin the analysis:
If you are serious about doing a confirmatory analysis, one approach is to split your data into three pieces before you begin the analysis:
1. 60% of your data goes into a **training** (or exploration) set.
You're allowed to do anything you like with this data: visualise it and fit tons of models to it.