O'Reilly feedback
This commit is contained in:
parent
be5905a09c
commit
86324b358d
|
@ -339,7 +339,7 @@ gss_cat |>
|
|||
count(partyid)
|
||||
```
|
||||
|
||||
`fct_recode()` will leave levels that aren't explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn't exist.
|
||||
`fct_recode()` will leave the levels that aren't explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn't exist.
|
||||
|
||||
To combine groups, you can assign multiple old levels to the same new level:
|
||||
|
||||
|
|
|
@ -344,7 +344,7 @@ flights |>
|
|||
In most cases, however, `any()` and `all()` are a little too crude, and it would be nice to be able to get a little more detail about how many values are `TRUE` or `FALSE`.
|
||||
That leads us to the numeric summaries.
|
||||
|
||||
### Numeric summaries
|
||||
### Numeric summaries of logical vectors
|
||||
|
||||
When you use a logical vector in a numeric context, `TRUE` becomes 1 and `FALSE` becomes 0.
|
||||
This makes `sum()` and `mean()` very useful with logical vectors because `sum(x)` will give the number of `TRUE`s and `mean(x)` the proportion of `TRUE`s.
|
||||
|
@ -382,7 +382,7 @@ flights |>
|
|||
### Logical subsetting
|
||||
|
||||
There's one final use for logical vectors in summaries: you can use a logical vector to filter a single variable to a subset of interest.
|
||||
This makes use of the base `[` (pronounced subset) operator, which you'll learn more about this in @sec-vector-subsetting.
|
||||
This makes use of the base `[` (pronounced subset) operator, which you'll learn more about in @sec-vector-subsetting.
|
||||
|
||||
Imagine we wanted to look at the average delay just for flights that were actually delayed.
|
||||
One way to do so would be to first filter the flights:
|
||||
|
|
|
@ -35,7 +35,7 @@ To begin, let's explore a few handy tools for creating or eliminating missing ex
|
|||
### Last observation carried forward
|
||||
|
||||
A common use for missing values is as a data entry convenience.
|
||||
Sometimes data that has been entered by hand, missing values indicate that the value in the previous row has been repeated:
|
||||
When data is entered by hand, missing values sometimes indicate that the value in the previous row has been repeated (or carried forward):
|
||||
|
||||
```{r}
|
||||
treatment <- tribble(
|
||||
|
@ -60,7 +60,7 @@ You can use the `.direction` argument to fill in missing values that have been g
|
|||
|
||||
### Fixed values
|
||||
|
||||
Some times missing values represent some fixed and known value, mostly commonly 0.
|
||||
Some times missing values represent some fixed and known value, most commonly 0.
|
||||
You can use `dplyr::coalesce()` to replace them:
|
||||
|
||||
```{r}
|
||||
|
|
10
numbers.qmd
10
numbers.qmd
|
@ -28,7 +28,7 @@ library(tidyverse)
|
|||
library(nycflights13)
|
||||
```
|
||||
|
||||
### Counts
|
||||
## Counts
|
||||
|
||||
It's surprising how much data science you can do with just counts and a little basic arithmetic, so dplyr strives to make counting as easy as possible with `count()`.
|
||||
This function is great for quick exploration and checks during analysis:
|
||||
|
@ -59,7 +59,7 @@ flights |>
|
|||
)
|
||||
```
|
||||
|
||||
`n()` is a special summary function that doesn't take any arguments and instead access information about the "current" group.
|
||||
`n()` is a special summary function that doesn't take any arguments and instead accesses information about the "current" group.
|
||||
This means that it only works inside dplyr verbs:
|
||||
|
||||
```{r}
|
||||
|
@ -554,7 +554,7 @@ You can lead or lag by more than one position by using the second argument, `n`.
|
|||
8. Find all destinations that are flown by at least two carriers.
|
||||
Use those destinations to come up with a relative ranking of the carriers based on their performance for the same destination.
|
||||
|
||||
## Summaries
|
||||
## Numeric summaries
|
||||
|
||||
Just using the counts, means, and sums that we've introduced already can get you a long way, but R provides many other useful summary functions.
|
||||
Here are a selection that you might find useful.
|
||||
|
@ -621,12 +621,12 @@ flights |>
|
|||
|
||||
### Spread
|
||||
|
||||
Sometimes you're not so interested in where the bulk of the data lies, but how it is spread out.
|
||||
Sometimes you're not so interested in where the bulk of the data lies, but in how it is spread out.
|
||||
Two commonly used summaries are the standard deviation, `sd(x)`, and the inter-quartile range, `IQR()`.
|
||||
We won't explain `sd()` here since you're probably already familiar with it, but `IQR()` might be new --- it's `quantile(x, 0.75) - quantile(x, 0.25)` and gives you the range that contains the middle 50% of the data.
|
||||
|
||||
We can use this to reveal a small oddity in the `flights` data.
|
||||
You might expect that the spread of the distance between origin and destination to be zero, since airports are always in the same place.
|
||||
You might expect the spread of the distance between origin and destination to be zero, since airports are always in the same place.
|
||||
But the code below makes it looks like one airport, [EGE](https://en.wikipedia.org/wiki/Eagle_County_Regional_Airport), might have moved.
|
||||
|
||||
```{r}
|
||||
|
|
Loading…
Reference in New Issue