Feedback from O'Reilly + style fixes
This commit is contained in:
parent
f0b19065c7
commit
19c89ebf64
|
@ -334,7 +334,7 @@ We can use `wday()` to see that more flights depart during the week than on the
|
|||
flights_dt |>
|
||||
mutate(wday = wday(dep_time, label = TRUE)) |>
|
||||
ggplot(aes(x = wday)) +
|
||||
geom_bar()
|
||||
geom_bar()
|
||||
```
|
||||
|
||||
There's an interesting pattern if we look at the average departure delay by minute within the hour.
|
||||
|
@ -353,9 +353,10 @@ flights_dt |>
|
|||
group_by(minute) |>
|
||||
summarize(
|
||||
avg_delay = mean(dep_delay, na.rm = TRUE),
|
||||
n = n()) |>
|
||||
n = n()
|
||||
) |>
|
||||
ggplot(aes(minute, avg_delay)) +
|
||||
geom_line()
|
||||
geom_line()
|
||||
```
|
||||
|
||||
Interestingly, if we look at the *scheduled* departure time we don't see such a strong pattern:
|
||||
|
@ -371,23 +372,30 @@ sched_dep <- flights_dt |>
|
|||
group_by(minute) |>
|
||||
summarize(
|
||||
avg_delay = mean(arr_delay, na.rm = TRUE),
|
||||
n = n())
|
||||
n = n()
|
||||
)
|
||||
|
||||
ggplot(sched_dep, aes(minute, avg_delay)) +
|
||||
geom_line()
|
||||
```
|
||||
|
||||
So why do we see that pattern with the actual departure times?
|
||||
Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times.
|
||||
Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times, as @fig-human-rounding shows.
|
||||
Always be alert for this sort of pattern whenever you work with data that involves human judgement!
|
||||
|
||||
```{r}
|
||||
#| label: fig-human-rounding
|
||||
#| fig-cap: >
|
||||
#| A frequency polygon showing the number of flights scheduled to
|
||||
#| depart each hour. You can see a strong preference for round numbers
|
||||
#| like 0 and 30 and generally for numbers that are a multiple of five.
|
||||
#| fig-alt: >
|
||||
#| A line plot with departure minute (0-60) on the x-axis and number of
|
||||
#| flights (0-60000) on the y-axis. Most flights are scheduled to depart
|
||||
#| on either the hour (~60,000) or the half hour (~35,000). Otherwise,
|
||||
#| all most all flights are scheduled to depart on multiples of five,
|
||||
#| with a few extra at 15, 45, and 55 minutes.
|
||||
#| echo: false
|
||||
ggplot(sched_dep, aes(minute, n)) +
|
||||
geom_line()
|
||||
```
|
||||
|
@ -421,7 +429,7 @@ You can use rounding to show the distribution of flights across the course of a
|
|||
flights_dt |>
|
||||
mutate(dep_hour = dep_time - floor_date(dep_time, "day")) |>
|
||||
ggplot(aes(dep_hour)) +
|
||||
geom_freqpoly(binwidth = 60 * 30)
|
||||
geom_freqpoly(binwidth = 60 * 30)
|
||||
```
|
||||
|
||||
Computing the difference between a pair of date-times yields a difftime (more on that in @sec-intervals).
|
||||
|
@ -438,12 +446,13 @@ We can convert that to an `hms` object to get a more useful x-axis:
|
|||
flights_dt |>
|
||||
mutate(dep_hour = hms::as_hms(dep_time - floor_date(dep_time, "day"))) |>
|
||||
ggplot(aes(dep_hour)) +
|
||||
geom_freqpoly(binwidth = 60 * 30)
|
||||
geom_freqpoly(binwidth = 60 * 30)
|
||||
```
|
||||
|
||||
### Modifying components
|
||||
|
||||
You can also use each accessor function to modify the components of a date/time:
|
||||
You can also use each accessor function to modify the components of a date/time.
|
||||
This doesn't come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.
|
||||
|
||||
```{r}
|
||||
(datetime <- ymd_hms("2026-07-08 12:34:56"))
|
||||
|
@ -490,7 +499,7 @@ update(ymd("2023-02-01"), hour = 400)
|
|||
|
||||
6. What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar?
|
||||
|
||||
7. Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
|
||||
7. Confirm our hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
|
||||
Hint: create a binary variable that tells you whether or not a flight was delayed.
|
||||
|
||||
## Time spans
|
||||
|
|
Loading…
Reference in New Issue