Tweak :: description and usage
This commit is contained in:
		
							
								
								
									
										16
									
								
								EDA.qmd
									
									
									
									
									
								
							
							
						
						
									
										16
									
								
								EDA.qmd
									
									
									
									
									
								
							@@ -112,7 +112,7 @@ ggplot(data = diamonds, mapping = aes(x = cut)) +
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
The height of the bars displays how many observations occurred with each x value.
 | 
			
		||||
You can compute these values manually with `dplyr::count()`:
 | 
			
		||||
You can compute these values manually with `count()`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
diamonds |> 
 | 
			
		||||
@@ -136,7 +136,7 @@ ggplot(data = diamonds, mapping = aes(x = carat)) +
 | 
			
		||||
  geom_histogram(binwidth = 0.5)
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
You can compute this by hand by combining `dplyr::count()` and `ggplot2::cut_width()`:
 | 
			
		||||
You can compute this by hand by combining `count()` and `cut_width()`:
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
diamonds |> 
 | 
			
		||||
@@ -359,17 +359,17 @@ If you've encountered unusual values in your dataset, and simply want to move on
 | 
			
		||||
 | 
			
		||||
2.  Instead, I recommend replacing the unusual values with missing values.
 | 
			
		||||
    The easiest way to do this is to use `mutate()` to replace the variable with a modified copy.
 | 
			
		||||
    You can use the `ifelse()` function to replace unusual values with `NA`:
 | 
			
		||||
    You can use the `if_else()` function to replace unusual values with `NA`:
 | 
			
		||||
 | 
			
		||||
    ```{r}
 | 
			
		||||
    diamonds2 <- diamonds |> 
 | 
			
		||||
      mutate(y = ifelse(y < 3 | y > 20, NA, y))
 | 
			
		||||
      mutate(y = if_else(y < 3 | y > 20, NA, y))
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
`ifelse()` has three arguments.
 | 
			
		||||
`if_else()` has three arguments.
 | 
			
		||||
The first argument `test` should be a logical vector.
 | 
			
		||||
The result will contain the value of the second argument, `yes`, when `test` is `TRUE`, and the value of the third argument, `no`, when it is false.
 | 
			
		||||
Alternatively to `if_else()`, use `dplyr::case_when()`.
 | 
			
		||||
Alternatively to `if_else()`, use `case_when()`.
 | 
			
		||||
`case_when()` is particularly useful inside mutate when you want to create a new variable that relies on a complex combination of existing variables or would otherwise require multiple `if_else()` statements nested inside one another.
 | 
			
		||||
 | 
			
		||||
Like R, ggplot2 subscribes to the philosophy that missing values should never silently go missing.
 | 
			
		||||
@@ -397,10 +397,12 @@ ggplot(data = diamonds2, mapping = aes(x = x, y = y)) +
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Other times you want to understand what makes observations with missing values different to observations with recorded values.
 | 
			
		||||
For example, in `nycflights13::flights`, missing values in the `dep_time` variable indicate that the flight was cancelled.
 | 
			
		||||
For example, in `nycflights13::flights`[^eda-1], missing values in the `dep_time` variable indicate that the flight was cancelled.
 | 
			
		||||
So you might want to compare the scheduled departure times for cancelled and non-cancelled times.
 | 
			
		||||
You can do this by making a new variable with `is.na()`.
 | 
			
		||||
 | 
			
		||||
[^eda-1]: Remember that when need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()` or `package::dataset`.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
#| fig-alt: >
 | 
			
		||||
#|   A frequency polygon of scheduled departure times of flights. Two lines 
 | 
			
		||||
 
 | 
			
		||||
@@ -31,6 +31,8 @@ library(tidyverse)
 | 
			
		||||
Take careful note of the conflicts message that's printed when you load the tidyverse.
 | 
			
		||||
It tells you that dplyr overwrites some functions in base R.
 | 
			
		||||
If you want to use the base version of these functions after loading dplyr, you'll need to use their full names: `stats::filter()` and `stats::lag()`.
 | 
			
		||||
So far we've mostly ignored which package a function comes from because most of the time it doesn't matter.
 | 
			
		||||
However, knowing the package can help you find help and find related functions, so when we need to be precise about which function a package comes from, we'll use the same syntax as R: `packagename::functionname()`.
 | 
			
		||||
 | 
			
		||||
### nycflights13
 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
@@ -42,9 +42,6 @@ library(tidyverse)
 | 
			
		||||
 | 
			
		||||
You only need to install a package once, but you need to reload it every time you start a new session.
 | 
			
		||||
 | 
			
		||||
If we need to be explicit about where a function (or dataset) comes from, we'll use the special form `package::function()`.
 | 
			
		||||
For example, `ggplot2::ggplot()` tells you explicitly that we're using the `ggplot()` function from the ggplot2 package.
 | 
			
		||||
 | 
			
		||||
## First steps
 | 
			
		||||
 | 
			
		||||
Let's use our first graph to answer a question: Do cars with big engines use more fuel than cars with small engines?
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user