r4ds/missing-values.Rmd

# Missing values {#missing-values}

## Introduction

## Basics

### Missing values {#missing-values-filter}

One important feature of R that can make comparison tricky is missing values, or `NA`s ("not availables").
`NA` represents an unknown value so missing values are "contagious": almost any operation involving an unknown value will also be unknown.

```{r}
NA > 5
10 == NA
NA + 10
NA / 2
```

The most confusing result is this one:

```{r}
NA == NA
```

It's easiest to understand why this is true with a bit more context:

```{r}
# Let x be Mary's age. We don't know how old she is.
x <- NA

# Let y be John's age. We don't know how old he is.
y <- NA

# Are John and Mary the same age?
x == y
# We don't know!
```

If you want to determine if a value is missing, use `is.na()`:

```{r}
is.na(x)
```

## dplyr verbs

`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values.
If you want to preserve missing values, ask for them explicitly:

```{r}
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)
filter(df, is.na(x) | x > 1)
```

Missing values are always sorted at the end:

```{r}
df <- tibble(x = c(5, 2, NA))
arrange(df, x)
arrange(df, desc(x))
```

## Exercises

1.  Why is `NA ^ 0` not missing?
    Why is `NA | TRUE` not missing?
    Why is `FALSE & NA` not missing?
    Can you figure out the general rule?
    (`NA * 0` is a tricky counterexample!)
Data transformation (#940) * Minor edit + link to style guide * Fix reference * If you don't know order of operations, not clear * Alt text + minor edits * Add median and fix reference * Move up mult groups up to discuss summarise msg * Go over grouping again * Part rename * Chapter rename * Clean up section labels to avoid dups * Update comment * Switch part order * Move columnwise to transform 2021-03-29 21:58:27 +08:00			`# Missing values {#missing-values}`
Second crack and 2e structure 2021-03-04 01:13:14 +08:00
			`## Introduction`
Break up data-transform content 2021-04-19 20:56:29 +08:00
			`## Basics`

			`### Missing values {#missing-values-filter}`

			One important feature of R that can make comparison tricky is missing values, or `NA`s ("not availables").
			`NA` represents an unknown value so missing values are "contagious": almost any operation involving an unknown value will also be unknown.

			```{r}
			`NA > 5`
			`10 == NA`
			`NA + 10`
			`NA / 2`
			```

			`The most confusing result is this one:`

			```{r}
			`NA == NA`
			```

			`It's easiest to understand why this is true with a bit more context:`

			```{r}
			`# Let x be Mary's age. We don't know how old she is.`
			`x <- NA`

			`# Let y be John's age. We don't know how old he is.`
			`y <- NA`

			`# Are John and Mary the same age?`
			`x == y`
			`# We don't know!`
			```

			If you want to determine if a value is missing, use `is.na()`:

			```{r}
			`is.na(x)`
			```

			`## dplyr verbs`

			`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values.
			`If you want to preserve missing values, ask for them explicitly:`

			```{r}
			`df <- tibble(x = c(1, NA, 3))`
			`filter(df, x > 1)`
			`filter(df, is.na(x) \| x > 1)`
			```

			`Missing values are always sorted at the end:`

			```{r}
			`df <- tibble(x = c(5, 2, NA))`
			`arrange(df, x)`
			`arrange(df, desc(x))`
			```

			`## Exercises`

			1. Why is `NA ^ 0` not missing?
			Why is `NA \| TRUE` not missing?
			Why is `FALSE & NA` not missing?
			`Can you figure out the general rule?`
			(`NA * 0` is a tricky counterexample!)