More @jennybc comments
This commit is contained in:
parent
daaa861f74
commit
6afdb03666
|
@ -285,14 +285,14 @@ Encodings are a rich and complex topic, and I've only scratched the surface here
|
|||
|
||||
### Factors {#readr-factors}
|
||||
|
||||
R uses factors to represent categorical variables that have a known set of possible values. Given `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
|
||||
R uses factors to represent categorical variables that have a known set of possible values. Give `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
|
||||
|
||||
```{r}
|
||||
fruit <- c("apple", "banana")
|
||||
parse_factor(c("apple", "banana", "bananana"), levels = fruit)
|
||||
```
|
||||
|
||||
If you have problematic entries, it's often easier to read in as strings and then use the tools you'll learn about in [strings] and [factors] to clean them up.
|
||||
But it you many problematic entries, it's often easier to leave as character vectors and then use the tools you'll learn about in [strings] and [factors] to clean them up.
|
||||
|
||||
### Dates, date-times, and times {#readr-datetimes}
|
||||
|
||||
|
|
|
@ -90,10 +90,6 @@ For nycflights13:
|
|||
it contained weather records for all airports in the USA, what additional
|
||||
relation would it define with `flights`?
|
||||
|
||||
1. You might expect that there's an implicit relationship between plane
|
||||
and airline, because each plane is flown by a single airline. Confirm
|
||||
or reject this hypothesis using data.
|
||||
|
||||
1. We know that some days of the year are "special", and fewer people than
|
||||
usual fly on them. How might you represent that data as a data frame?
|
||||
What would be the primary keys of that table? How would it connect to the
|
||||
|
@ -531,6 +527,10 @@ flights %>%
|
|||
1. What does `anti_join(flights, airports, by = c("dest" = "faa"))` tell you?
|
||||
What does `anti_join(airports, flights, by = c("faa" = "dest"))` tell you?
|
||||
|
||||
1. You might expect that there's an implicit relationship between plane
|
||||
and airline, because each plane is flown by a single airline. Confirm
|
||||
or reject this hypothesis using the tools you've learned above.
|
||||
|
||||
## Join problems
|
||||
|
||||
The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.
|
||||
|
|
|
@ -158,6 +158,9 @@ The main reason that some older functions don't work with tibble is the `[` func
|
|||
df[, c("abc", "xyz")]
|
||||
```
|
||||
|
||||
1. If you have the name of a variable stored in an object, e.g. `var <- "mpg"`,
|
||||
how can you extract the reference variable from a tibble?
|
||||
|
||||
1. Practice referring to non-syntactic names in the following data frame by:
|
||||
|
||||
1. Extracting the variable called `1`.
|
||||
|
|
9
tidy.Rmd
9
tidy.Rmd
|
@ -340,7 +340,8 @@ table5 %>%
|
|||
do? Why would you set it to `FALSE`?
|
||||
|
||||
1. Compare and contrast `separate()` and `extract()`. Why are there
|
||||
three variations of separation, but only one unite?
|
||||
three variations of separation (by position, by separator, and with
|
||||
groups), but only one unite?
|
||||
|
||||
## Missing values
|
||||
|
||||
|
@ -441,7 +442,7 @@ The best place to start is almost always to gather together the columns that are
|
|||
in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`)
|
||||
these are likely to be values, not variables.
|
||||
|
||||
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
|
||||
So we need to gather together all the columns from `new_sp_m014` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
|
||||
|
||||
```{r}
|
||||
who1 <- who %>%
|
||||
|
@ -539,10 +540,10 @@ who %>%
|
|||
missing values? What's the difference between an `NA` and zero?
|
||||
|
||||
1. What happens if you neglect the `mutate()` step?
|
||||
(`mutate(key = stringr::str_replace(key, "newrel", "new_rel"))`)
|
||||
|
||||
1. I claimed that `iso2` and `iso3` were redundant with `country`.
|
||||
Confirm my claim by creating a table that uniquely maps from `country`
|
||||
to `iso2` and `iso3`.
|
||||
Confirm this claim.
|
||||
|
||||
1. For each country, year, and sex compute the total number of cases of
|
||||
TB. Make an informative visualisation of the data.
|
||||
|
|
Loading…
Reference in New Issue