O'Reilly feedback
This commit is contained in:
		| @@ -339,7 +339,7 @@ gss_cat |> | ||||
|   count(partyid) | ||||
| ``` | ||||
|  | ||||
| `fct_recode()` will leave levels that aren't explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn't exist. | ||||
| `fct_recode()` will leave the levels that aren't explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn't exist. | ||||
|  | ||||
| To combine groups, you can assign multiple old levels to the same new level: | ||||
|  | ||||
|   | ||||
| @@ -344,7 +344,7 @@ flights |> | ||||
| In most cases, however, `any()` and `all()` are a little too crude, and it would be nice to be able to get a little more detail about how many values are `TRUE` or `FALSE`. | ||||
| That leads us to the numeric summaries. | ||||
|  | ||||
| ### Numeric summaries | ||||
| ### Numeric summaries of logical vectors | ||||
|  | ||||
| When you use a logical vector in a numeric context, `TRUE` becomes 1 and `FALSE` becomes 0. | ||||
| This makes `sum()` and `mean()` very useful with logical vectors because `sum(x)` will give the number of `TRUE`s and `mean(x)` the proportion of `TRUE`s. | ||||
| @@ -382,7 +382,7 @@ flights |> | ||||
| ### Logical subsetting | ||||
|  | ||||
| There's one final use for logical vectors in summaries: you can use a logical vector to filter a single variable to a subset of interest. | ||||
| This makes use of the base `[` (pronounced subset) operator, which you'll learn more about this in @sec-vector-subsetting. | ||||
| This makes use of the base `[` (pronounced subset) operator, which you'll learn more about in @sec-vector-subsetting. | ||||
|  | ||||
| Imagine we wanted to look at the average delay just for flights that were actually delayed. | ||||
| One way to do so would be to first filter the flights: | ||||
|   | ||||
| @@ -35,7 +35,7 @@ To begin, let's explore a few handy tools for creating or eliminating missing ex | ||||
| ### Last observation carried forward | ||||
|  | ||||
| A common use for missing values is as a data entry convenience. | ||||
| Sometimes data that has been entered by hand, missing values indicate that the value in the previous row has been repeated: | ||||
| When data is entered by hand, missing values sometimes indicate that the value in the previous row has been repeated (or carried forward): | ||||
|  | ||||
| ```{r} | ||||
| treatment <- tribble( | ||||
| @@ -60,7 +60,7 @@ You can use the `.direction` argument to fill in missing values that have been g | ||||
|  | ||||
| ### Fixed values | ||||
|  | ||||
| Some times missing values represent some fixed and known value, mostly commonly 0. | ||||
| Some times missing values represent some fixed and known value, most commonly 0. | ||||
| You can use `dplyr::coalesce()` to replace them: | ||||
|  | ||||
| ```{r} | ||||
|   | ||||
							
								
								
									
										10
									
								
								numbers.qmd
									
									
									
									
									
								
							
							
						
						
									
										10
									
								
								numbers.qmd
									
									
									
									
									
								
							| @@ -28,7 +28,7 @@ library(tidyverse) | ||||
| library(nycflights13) | ||||
| ``` | ||||
|  | ||||
| ### Counts | ||||
| ## Counts | ||||
|  | ||||
| It's surprising how much data science you can do with just counts and a little basic arithmetic, so dplyr strives to make counting as easy as possible with `count()`. | ||||
| This function is great for quick exploration and checks during analysis: | ||||
| @@ -59,7 +59,7 @@ flights |> | ||||
|   ) | ||||
| ``` | ||||
|  | ||||
| `n()` is a special summary function that doesn't take any arguments and instead access information about the "current" group. | ||||
| `n()` is a special summary function that doesn't take any arguments and instead accesses information about the "current" group. | ||||
| This means that it only works inside dplyr verbs: | ||||
|  | ||||
| ```{r} | ||||
| @@ -554,7 +554,7 @@ You can lead or lag by more than one position by using the second argument, `n`. | ||||
| 8.  Find all destinations that are flown by at least two carriers. | ||||
|     Use those destinations to come up with a relative ranking of the carriers based on their performance for the same destination. | ||||
|  | ||||
| ## Summaries | ||||
| ## Numeric summaries | ||||
|  | ||||
| Just using the counts, means, and sums that we've introduced already can get you a long way, but R provides many other useful summary functions. | ||||
| Here are a selection that you might find useful. | ||||
| @@ -621,12 +621,12 @@ flights |> | ||||
|  | ||||
| ### Spread | ||||
|  | ||||
| Sometimes you're not so interested in where the bulk of the data lies, but how it is spread out. | ||||
| Sometimes you're not so interested in where the bulk of the data lies, but in how it is spread out. | ||||
| Two commonly used summaries are the standard deviation, `sd(x)`, and the inter-quartile range, `IQR()`. | ||||
| We won't explain `sd()` here since you're probably already familiar with it, but `IQR()` might be new --- it's `quantile(x, 0.75) - quantile(x, 0.25)` and gives you the range that contains the middle 50% of the data. | ||||
|  | ||||
| We can use this to reveal a small oddity in the `flights` data. | ||||
| You might expect that the spread of the distance between origin and destination to be zero, since airports are always in the same place. | ||||
| You might expect the spread of the distance between origin and destination to be zero, since airports are always in the same place. | ||||
| But the code below makes it looks like one airport, [EGE](https://en.wikipedia.org/wiki/Eagle_County_Regional_Airport), might have moved. | ||||
|  | ||||
| ```{r} | ||||
|   | ||||
		Reference in New Issue
	
	Block a user