From 914192f1e98603f8e12cc8493fb873465cd1428a Mon Sep 17 00:00:00 2001 From: hadley Date: Tue, 26 Jul 2016 15:38:56 -0500 Subject: [PATCH] Use na.warn --- model-basics.Rmd | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/model-basics.Rmd b/model-basics.Rmd index 6a93581..661a233 100644 --- a/model-basics.Rmd +++ b/model-basics.Rmd @@ -43,12 +43,12 @@ The goal of a model is not to uncover truth, but to discover a simple approximat ### Prerequisites -We need a couple of packages specifically designed for modelling, and all the packages you've used before for EDA. +We need a couple of packages specifically designed for modelling, and all the packages you've used before for EDA. ```{r setup, message = FALSE} # Modelling functions library(modelr) -library(broom) +options(na.action = na.warn) # EDA tools library(ggplot2) @@ -661,7 +661,7 @@ sim6 %>% ## Missing values -Missing values obviously can not convey any information about the relationship between the variables, so modelling functions will silently drop any rows that contain missing values: +Missing values obviously can not convey any information about the relationship between the variables, so modelling functions will drop any rows that contain missing values. R's default behaviour is to silently drop them, but `options(na.action = na.warn)` (run in the prerequisites), makes sure you get a warning. ```{r} df <- tibble::frame_data( @@ -676,11 +676,16 @@ df <- tibble::frame_data( mod <- lm(y ~ x, data = df) ``` -Unfortunately this is one of the rare cases in R where missing values will go silently missing without any warning. You can spot their absence by comparing the number of rows in the data frame with the number of observations used by the model: +To suppress the warning, set `na.action = na.exclude`: + +```{r} +mod <- lm(y ~ x, data = df, na.action = na.exclude) +``` + +You can always see exactly how many observations were used with `nobs()`: ```{r} nobs(mod) -nrow(df) ``` ## Other model families