Use na.warn

This commit is contained in:
hadley 2016-07-26 15:38:56 -05:00
parent 8edc9a8768
commit 914192f1e9
1 changed files with 10 additions and 5 deletions

View File

@ -43,12 +43,12 @@ The goal of a model is not to uncover truth, but to discover a simple approximat
### Prerequisites
We need a couple of packages specifically designed for modelling, and all the packages you've used before for EDA.
We need a couple of packages specifically designed for modelling, and all the packages you've used before for EDA.
```{r setup, message = FALSE}
# Modelling functions
library(modelr)
library(broom)
options(na.action = na.warn)
# EDA tools
library(ggplot2)
@ -661,7 +661,7 @@ sim6 %>%
## Missing values
Missing values obviously can not convey any information about the relationship between the variables, so modelling functions will silently drop any rows that contain missing values:
Missing values obviously can not convey any information about the relationship between the variables, so modelling functions will drop any rows that contain missing values. R's default behaviour is to silently drop them, but `options(na.action = na.warn)` (run in the prerequisites), makes sure you get a warning.
```{r}
df <- tibble::frame_data(
@ -676,11 +676,16 @@ df <- tibble::frame_data(
mod <- lm(y ~ x, data = df)
```
Unfortunately this is one of the rare cases in R where missing values will go silently missing without any warning. You can spot their absence by comparing the number of rows in the data frame with the number of observations used by the model:
To suppress the warning, set `na.action = na.exclude`:
```{r}
mod <- lm(y ~ x, data = df, na.action = na.exclude)
```
You can always see exactly how many observations were used with `nobs()`:
```{r}
nobs(mod)
nrow(df)
```
## Other model families