Fix code language
This commit is contained in:
@@ -13,7 +13,7 @@ Introduction</h1>
|
||||
Prerequisites</h2>
|
||||
<p>This chapter will focus on the <strong>lubridate</strong> package, which makes it easier to work with dates and times in R. lubridate is not part of core tidyverse because you only need it when you’re working with dates/times. We will also need nycflights13 for practice data.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">library(tidyverse)
|
||||
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
|
||||
|
||||
library(lubridate)
|
||||
library(nycflights13)</pre>
|
||||
@@ -32,7 +32,7 @@ Creating date/times</h1>
|
||||
<p>You should always use the simplest possible data type that works for your needs. That means if you can use a date instead of a date-time, you should. Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.</p>
|
||||
<p>To get the current date or date-time you can use <code><a href="https://lubridate.tidyverse.org/reference/now.html">today()</a></code> or <code><a href="https://lubridate.tidyverse.org/reference/now.html">now()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">today()
|
||||
<pre data-type="programlisting" data-code-language="r">today()
|
||||
#> [1] "2022-11-18"
|
||||
now()
|
||||
#> [1] "2022-11-18 10:59:07 CST"</pre>
|
||||
@@ -48,7 +48,7 @@ now()
|
||||
During import</h2>
|
||||
<p>If your CSV contains an ISO8601 date or date-time, you don’t need to do anything; readr will automatically recognize it:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">csv <- "
|
||||
<pre data-type="programlisting" data-code-language="r">csv <- "
|
||||
date,datetime
|
||||
2022-01-02,2022-01-02 05:12
|
||||
"
|
||||
@@ -137,7 +137,7 @@ read_csv(csv)
|
||||
</tr></tbody></table></div>
|
||||
<p>And this code shows some a few options applied to a very ambiguous date:</p>
|
||||
<div class="cell" data-messages="false">
|
||||
<pre data-type="programlisting" data-code-language="downlit">csv <- "
|
||||
<pre data-type="programlisting" data-code-language="r">csv <- "
|
||||
date
|
||||
01/02/15
|
||||
"
|
||||
@@ -169,7 +169,7 @@ read_csv(csv, col_types = cols(date = col_date("%y/%m/%d")))
|
||||
From strings</h2>
|
||||
<p>The date-time specification language is powerful, but requires careful analysis of the date format. An alternative approach is to use lubridate’s helpers which attempt to automatically determine the format once you specify the order of the component. To use them, identify the order in which year, month, and day appear in your dates, then arrange “y”, “m”, and “d” in the same order. That gives you the name of the lubridate function that will parse your date. For example:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ymd("2017-01-31")
|
||||
<pre data-type="programlisting" data-code-language="r">ymd("2017-01-31")
|
||||
#> [1] "2017-01-31"
|
||||
mdy("January 31st, 2017")
|
||||
#> [1] "2017-01-31"
|
||||
@@ -178,14 +178,14 @@ dmy("31-Jan-2017")
|
||||
</div>
|
||||
<p><code><a href="https://lubridate.tidyverse.org/reference/ymd.html">ymd()</a></code> and friends create dates. To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ymd_hms("2017-01-31 20:11:59")
|
||||
<pre data-type="programlisting" data-code-language="r">ymd_hms("2017-01-31 20:11:59")
|
||||
#> [1] "2017-01-31 20:11:59 UTC"
|
||||
mdy_hm("01/31/2017 08:01")
|
||||
#> [1] "2017-01-31 08:01:00 UTC"</pre>
|
||||
</div>
|
||||
<p>You can also force the creation of a date-time from a date by supplying a timezone:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ymd("2017-01-31", tz = "UTC")
|
||||
<pre data-type="programlisting" data-code-language="r">ymd("2017-01-31", tz = "UTC")
|
||||
#> [1] "2017-01-31 UTC"</pre>
|
||||
</div>
|
||||
</section>
|
||||
@@ -195,7 +195,7 @@ mdy_hm("01/31/2017 08:01")
|
||||
From individual components</h2>
|
||||
<p>Instead of a single string, sometimes you’ll have the individual components of the date-time spread across multiple columns. This is what we have in the <code>flights</code> data:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights |>
|
||||
select(year, month, day, hour, minute)
|
||||
#> # A tibble: 336,776 × 5
|
||||
#> year month day hour minute
|
||||
@@ -210,7 +210,7 @@ From individual components</h2>
|
||||
</div>
|
||||
<p>To create a date/time from this sort of input, use <code><a href="https://lubridate.tidyverse.org/reference/make_datetime.html">make_date()</a></code> for dates, or <code><a href="https://lubridate.tidyverse.org/reference/make_datetime.html">make_datetime()</a></code> for date-times:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights |>
|
||||
select(year, month, day, hour, minute) |>
|
||||
mutate(departure = make_datetime(year, month, day, hour, minute))
|
||||
#> # A tibble: 336,776 × 6
|
||||
@@ -226,7 +226,7 @@ From individual components</h2>
|
||||
</div>
|
||||
<p>Let’s do the same thing for each of the four time columns in <code>flights</code>. The times are represented in a slightly odd format, so we use modulus arithmetic to pull out the hour and minute components. Once we’ve created the date-time variables, we focus in on the variables we’ll explore in the rest of the chapter.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">make_datetime_100 <- function(year, month, day, time) {
|
||||
<pre data-type="programlisting" data-code-language="r">make_datetime_100 <- function(year, month, day, time) {
|
||||
make_datetime(year, month, day, time %/% 100, time %% 100)
|
||||
}
|
||||
|
||||
@@ -255,7 +255,7 @@ flights_dt
|
||||
</div>
|
||||
<p>With this data, we can visualize the distribution of departure times across the year:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
ggplot(aes(dep_time)) +
|
||||
geom_freqpoly(binwidth = 86400) # 86400 seconds = 1 day</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -264,7 +264,7 @@ flights_dt
|
||||
</div>
|
||||
<p>Or within a single day:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
filter(dep_time < ymd(20130102)) |>
|
||||
ggplot(aes(dep_time)) +
|
||||
geom_freqpoly(binwidth = 600) # 600 s = 10 minutes</pre>
|
||||
@@ -280,14 +280,14 @@ flights_dt
|
||||
From other types</h2>
|
||||
<p>You may want to switch between a date-time and a date. That’s the job of <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_datetime()</a></code> and <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_date()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">as_datetime(today())
|
||||
<pre data-type="programlisting" data-code-language="r">as_datetime(today())
|
||||
#> [1] "2022-11-18 UTC"
|
||||
as_date(now())
|
||||
#> [1] "2022-11-18"</pre>
|
||||
</div>
|
||||
<p>Sometimes you’ll get date/times as numeric offsets from the “Unix Epoch”, 1970-01-01. If the offset is in seconds, use <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_datetime()</a></code>; if it’s in days, use <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_date()</a></code>.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">as_datetime(60 * 60 * 10)
|
||||
<pre data-type="programlisting" data-code-language="r">as_datetime(60 * 60 * 10)
|
||||
#> [1] "1970-01-01 10:00:00 UTC"
|
||||
as_date(365 * 10 + 2)
|
||||
#> [1] "1980-01-01"</pre>
|
||||
@@ -300,14 +300,14 @@ Exercises</h2>
|
||||
<ol type="1"><li>
|
||||
<p>What happens if you parse a string that contains invalid dates?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ymd(c("2010-10-10", "bananas"))</pre>
|
||||
<pre data-type="programlisting" data-code-language="r">ymd(c("2010-10-10", "bananas"))</pre>
|
||||
</div>
|
||||
</li>
|
||||
<li><p>What does the <code>tzone</code> argument to <code><a href="https://lubridate.tidyverse.org/reference/now.html">today()</a></code> do? Why is it important?</p></li>
|
||||
<li>
|
||||
<p>For each of the following date-times show how you’d parse it using a readr column-specification and a lubridate function.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">d1 <- "January 1, 2010"
|
||||
<pre data-type="programlisting" data-code-language="r">d1 <- "January 1, 2010"
|
||||
d2 <- "2015-Mar-07"
|
||||
d3 <- "06-Jun-2017"
|
||||
d4 <- c("August 19 (2015)", "July 1 (2015)")
|
||||
@@ -329,7 +329,7 @@ Date-time components</h1>
|
||||
Getting components</h2>
|
||||
<p>You can pull out individual parts of the date with the accessor functions <code><a href="https://lubridate.tidyverse.org/reference/year.html">year()</a></code>, <code><a href="https://lubridate.tidyverse.org/reference/month.html">month()</a></code>, <code><a href="https://lubridate.tidyverse.org/reference/day.html">mday()</a></code> (day of the month), <code><a href="https://lubridate.tidyverse.org/reference/day.html">yday()</a></code> (day of the year), <code><a href="https://lubridate.tidyverse.org/reference/day.html">wday()</a></code> (day of the week), <code><a href="https://lubridate.tidyverse.org/reference/hour.html">hour()</a></code>, <code><a href="https://lubridate.tidyverse.org/reference/minute.html">minute()</a></code>, and <code><a href="https://lubridate.tidyverse.org/reference/second.html">second()</a></code>.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">datetime <- ymd_hms("2026-07-08 12:34:56")
|
||||
<pre data-type="programlisting" data-code-language="r">datetime <- ymd_hms("2026-07-08 12:34:56")
|
||||
|
||||
year(datetime)
|
||||
#> [1] 2026
|
||||
@@ -345,7 +345,7 @@ wday(datetime)
|
||||
</div>
|
||||
<p>For <code><a href="https://lubridate.tidyverse.org/reference/month.html">month()</a></code> and <code><a href="https://lubridate.tidyverse.org/reference/day.html">wday()</a></code> you can set <code>label = TRUE</code> to return the abbreviated name of the month or day of the week. Set <code>abbr = FALSE</code> to return the full name.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">month(datetime, label = TRUE)
|
||||
<pre data-type="programlisting" data-code-language="r">month(datetime, label = TRUE)
|
||||
#> [1] Jul
|
||||
#> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
|
||||
wday(datetime, label = TRUE, abbr = FALSE)
|
||||
@@ -354,7 +354,7 @@ wday(datetime, label = TRUE, abbr = FALSE)
|
||||
</div>
|
||||
<p>We can use <code><a href="https://lubridate.tidyverse.org/reference/day.html">wday()</a></code> to see that more flights depart during the week than on the weekend:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
mutate(wday = wday(dep_time, label = TRUE)) |>
|
||||
ggplot(aes(x = wday)) +
|
||||
geom_bar()</pre>
|
||||
@@ -364,7 +364,7 @@ wday(datetime, label = TRUE, abbr = FALSE)
|
||||
</div>
|
||||
<p>There’s an interesting pattern if we look at the average departure delay by minute within the hour. It looks like flights leaving in minutes 20-30 and 50-60 have much lower delays than the rest of the hour!</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
mutate(minute = minute(dep_time)) |>
|
||||
group_by(minute) |>
|
||||
summarise(
|
||||
@@ -378,7 +378,7 @@ wday(datetime, label = TRUE, abbr = FALSE)
|
||||
</div>
|
||||
<p>Interestingly, if we look at the <em>scheduled</em> departure time we don’t see such a strong pattern:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">sched_dep <- flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">sched_dep <- flights_dt |>
|
||||
mutate(minute = minute(sched_dep_time)) |>
|
||||
group_by(minute) |>
|
||||
summarise(
|
||||
@@ -393,7 +393,7 @@ ggplot(sched_dep, aes(minute, avg_delay)) +
|
||||
</div>
|
||||
<p>So why do we see that pattern with the actual departure times? Well, like much data collected by humans, there’s a strong bias towards flights leaving at “nice” departure times. Always be alert for this sort of pattern whenever you work with data that involves human judgement!</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(sched_dep, aes(minute, n)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(sched_dep, aes(minute, n)) +
|
||||
geom_line()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="datetimes_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid" alt="A line plot with departure minute (0-60) on the x-axis and number of flights (0-60000) on the y-axis. Most flights are scheduled to depart on either the hour (~60,000) or the half hour (~35,000). Otherwise, all most all flights are scheduled to depart on multiples of five, with a few extra at 15, 45, and 55 minutes." width="576"/></p>
|
||||
@@ -406,7 +406,7 @@ ggplot(sched_dep, aes(minute, avg_delay)) +
|
||||
Rounding</h2>
|
||||
<p>An alternative approach to plotting individual components is to round the date to a nearby unit of time, with <code><a href="https://lubridate.tidyverse.org/reference/round_date.html">floor_date()</a></code>, <code><a href="https://lubridate.tidyverse.org/reference/round_date.html">round_date()</a></code>, and <code><a href="https://lubridate.tidyverse.org/reference/round_date.html">ceiling_date()</a></code>. Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to. This, for example, allows us to plot the number of flights per week:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
count(week = floor_date(dep_time, "week")) |>
|
||||
ggplot(aes(week, n)) +
|
||||
geom_line() +
|
||||
@@ -417,7 +417,7 @@ Rounding</h2>
|
||||
</div>
|
||||
<p>You can use rounding to show the distribution of flights across the course of a day by computing the difference between <code>dep_time</code> and the earliest instant of that day:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
mutate(dep_hour = dep_time - floor_date(dep_time, "day")) |>
|
||||
ggplot(aes(dep_hour)) +
|
||||
geom_freqpoly(binwidth = 60 * 30)
|
||||
@@ -429,7 +429,7 @@ Rounding</h2>
|
||||
</div>
|
||||
<p>Computing the difference between a pair of date-times yields a difftime (more on that in <a href="#sec-intervals" data-type="xref">#sec-intervals</a>). We can convert that to an <code>hms</code> object to get a more useful x-axis:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
mutate(dep_hour = hms::as_hms(dep_time - floor_date(dep_time, "day"))) |>
|
||||
ggplot(aes(dep_hour)) +
|
||||
geom_freqpoly(binwidth = 60 * 30)</pre>
|
||||
@@ -444,7 +444,7 @@ Rounding</h2>
|
||||
Modifying components</h2>
|
||||
<p>You can also use each accessor function to modify the components of a date/time:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">(datetime <- ymd_hms("2026-07-08 12:34:56"))
|
||||
<pre data-type="programlisting" data-code-language="r">(datetime <- ymd_hms("2026-07-08 12:34:56"))
|
||||
#> [1] "2026-07-08 12:34:56 UTC"
|
||||
|
||||
year(datetime) <- 2030
|
||||
@@ -459,12 +459,12 @@ datetime
|
||||
</div>
|
||||
<p>Alternatively, rather than modifying an existing variabke, you can create a new date-time with <code><a href="https://rdrr.io/r/stats/update.html">update()</a></code>. This also allows you to set multiple values in one step:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">update(datetime, year = 2030, month = 2, mday = 2, hour = 2)
|
||||
<pre data-type="programlisting" data-code-language="r">update(datetime, year = 2030, month = 2, mday = 2, hour = 2)
|
||||
#> [1] "2030-02-02 02:34:56 UTC"</pre>
|
||||
</div>
|
||||
<p>If values are too big, they will roll-over:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">update(ymd("2023-02-01"), mday = 30)
|
||||
<pre data-type="programlisting" data-code-language="r">update(ymd("2023-02-01"), mday = 30)
|
||||
#> [1] "2023-03-02"
|
||||
update(ymd("2023-02-01"), hour = 400)
|
||||
#> [1] "2023-02-17 16:00:00 UTC"</pre>
|
||||
@@ -501,19 +501,19 @@ Time spans</h1>
|
||||
Durations</h2>
|
||||
<p>In R, when you subtract two dates, you get a difftime object:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit"># How old is Hadley?
|
||||
<pre data-type="programlisting" data-code-language="r"># How old is Hadley?
|
||||
h_age <- today() - ymd("1979-10-14")
|
||||
h_age
|
||||
#> Time difference of 15741 days</pre>
|
||||
</div>
|
||||
<p>A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the <strong>duration</strong>.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">as.duration(h_age)
|
||||
<pre data-type="programlisting" data-code-language="r">as.duration(h_age)
|
||||
#> [1] "1360022400s (~43.1 years)"</pre>
|
||||
</div>
|
||||
<p>Durations come with a bunch of convenient constructors:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">dseconds(15)
|
||||
<pre data-type="programlisting" data-code-language="r">dseconds(15)
|
||||
#> [1] "15s"
|
||||
dminutes(10)
|
||||
#> [1] "600s (~10 minutes)"
|
||||
@@ -530,19 +530,19 @@ dyears(1)
|
||||
<p>Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week. Larger time units are more problematic. A year is uses the “average” number of days in a year, i.e. 365.25. There’s no way to convert a month to a duration, because there’s just too much variation.</p>
|
||||
<p>You can add and multiply durations:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">2 * dyears(1)
|
||||
<pre data-type="programlisting" data-code-language="r">2 * dyears(1)
|
||||
#> [1] "63115200s (~2 years)"
|
||||
dyears(1) + dweeks(12) + dhours(15)
|
||||
#> [1] "38869200s (~1.23 years)"</pre>
|
||||
</div>
|
||||
<p>You can add and subtract durations to and from days:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">tomorrow <- today() + ddays(1)
|
||||
<pre data-type="programlisting" data-code-language="r">tomorrow <- today() + ddays(1)
|
||||
last_year <- today() - dyears(1)</pre>
|
||||
</div>
|
||||
<p>However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">one_pm <- ymd_hms("2026-03-12 13:00:00", tz = "America/New_York")
|
||||
<pre data-type="programlisting" data-code-language="r">one_pm <- ymd_hms("2026-03-12 13:00:00", tz = "America/New_York")
|
||||
|
||||
one_pm
|
||||
#> [1] "2026-03-12 13:00:00 EDT"
|
||||
@@ -557,14 +557,14 @@ one_pm + ddays(1)
|
||||
Periods</h2>
|
||||
<p>To solve this problem, lubridate provides <strong>periods</strong>. Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them to work in a more intuitive way:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">one_pm
|
||||
<pre data-type="programlisting" data-code-language="r">one_pm
|
||||
#> [1] "2026-03-12 13:00:00 EDT"
|
||||
one_pm + days(1)
|
||||
#> [1] "2026-03-13 13:00:00 EDT"</pre>
|
||||
</div>
|
||||
<p>Like durations, periods can be created with a number of friendly constructor functions.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">hours(c(12, 24))
|
||||
<pre data-type="programlisting" data-code-language="r">hours(c(12, 24))
|
||||
#> [1] "12H 0M 0S" "24H 0M 0S"
|
||||
days(7)
|
||||
#> [1] "7d 0H 0M 0S"
|
||||
@@ -574,14 +574,14 @@ months(1:6)
|
||||
</div>
|
||||
<p>You can add and multiply periods:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">10 * (months(6) + days(1))
|
||||
<pre data-type="programlisting" data-code-language="r">10 * (months(6) + days(1))
|
||||
#> [1] "60m 10d 0H 0M 0S"
|
||||
days(50) + hours(25) + minutes(2)
|
||||
#> [1] "50d 25H 2M 0S"</pre>
|
||||
</div>
|
||||
<p>And of course, add them to dates. Compared to durations, periods are more likely to do what you expect:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit"># A leap year
|
||||
<pre data-type="programlisting" data-code-language="r"># A leap year
|
||||
ymd("2024-01-01") + dyears(1)
|
||||
#> [1] "2024-12-31 06:00:00 UTC"
|
||||
ymd("2024-01-01") + years(1)
|
||||
@@ -595,7 +595,7 @@ one_pm + days(1)
|
||||
</div>
|
||||
<p>Let’s use periods to fix an oddity related to our flight dates. Some planes appear to have arrived at their destination <em>before</em> they departed from New York City.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
filter(arr_time < dep_time)
|
||||
#> # A tibble: 10,640 × 9
|
||||
#> origin dest dep_delay arr_delay dep_time sched_dep_time
|
||||
@@ -611,7 +611,7 @@ one_pm + days(1)
|
||||
</div>
|
||||
<p>These are overnight flights. We used the same date information for both the departure and the arrival times, but these flights arrived on the following day. We can fix this by adding <code>days(1)</code> to the arrival time of each overnight flight.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt <- flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt <- flights_dt |>
|
||||
mutate(
|
||||
overnight = arr_time < dep_time,
|
||||
arr_time = arr_time + days(if_else(overnight, 0, 1)),
|
||||
@@ -620,7 +620,7 @@ one_pm + days(1)
|
||||
</div>
|
||||
<p>Now all of our flights obey the laws of physics.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights_dt |>
|
||||
<pre data-type="programlisting" data-code-language="r">flights_dt |>
|
||||
filter(overnight, arr_time < dep_time)
|
||||
#> # A tibble: 10,640 × 10
|
||||
#> origin dest dep_delay arr_delay dep_time sched_dep_time
|
||||
@@ -642,13 +642,13 @@ Intervals</h2>
|
||||
<p>It’s obvious what <code>dyears(1) / ddays(365)</code> should return: one, because durations are always represented by a number of seconds, and a duration of a year is defined as 365 days worth of seconds.</p>
|
||||
<p>What should <code>years(1) / days(1)</code> return? Well, if the year was 2015 it should return 365, but if it was 2016, it should return 366! There’s not quite enough information for lubridate to give a single clear answer. What it does instead is give an estimate:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">years(1) / days(1)
|
||||
<pre data-type="programlisting" data-code-language="r">years(1) / days(1)
|
||||
#> [1] 365.25</pre>
|
||||
</div>
|
||||
<p>If you want a more accurate measurement, you’ll have to use an <strong>interval</strong>. An interval is a pair of starting and ending date times, or you can think of it as a duration with a starting point.</p>
|
||||
<p>You can create an interval by writing <code>start %--% end</code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">y2023 <- ymd("2023-01-01") %--% ymd("2024-01-01")
|
||||
<pre data-type="programlisting" data-code-language="r">y2023 <- ymd("2023-01-01") %--% ymd("2024-01-01")
|
||||
y2024 <- ymd("2024-01-01") %--% ymd("2025-01-01")
|
||||
|
||||
y2023
|
||||
@@ -658,7 +658,7 @@ y2024
|
||||
</div>
|
||||
<p>You could then divide it by <code><a href="https://lubridate.tidyverse.org/reference/period.html">days()</a></code> to find out how many days fit in the year:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">y2023 / days(1)
|
||||
<pre data-type="programlisting" data-code-language="r">y2023 / days(1)
|
||||
#> [1] 365
|
||||
y2024 / days(1)
|
||||
#> [1] 366</pre>
|
||||
@@ -684,13 +684,13 @@ Time zones</h1>
|
||||
<p>You might wonder why the time zone uses a city, when typically you think of time zones as associated with a country or region within a country. This is because the IANA database has to record decades worth of time zone rules. Over the course of decades, countries change names (or break apart) fairly frequently, but city names tend to stay the same. Another problem is that the name needs to reflect not only the current behavior, but also the complete history. For example, there are time zones for both “America/New_York” and “America/Detroit”. These cities both currently use Eastern Standard Time but in 1969-1972 Michigan (the state in which Detroit is located), did not follow DST, so it needs a different name. It’s worth reading the raw time zone database (available at <a href="https://www.iana.org/time-zones" class="uri">https://www.iana.org/time-zones</a>) just to read some of these stories!</p>
|
||||
<p>You can find out what R thinks your current time zone is with <code><a href="https://rdrr.io/r/base/timezones.html">Sys.timezone()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">Sys.timezone()
|
||||
<pre data-type="programlisting" data-code-language="r">Sys.timezone()
|
||||
#> [1] "America/Chicago"</pre>
|
||||
</div>
|
||||
<p>(If R doesn’t know, you’ll get an <code>NA</code>.)</p>
|
||||
<p>And see the complete list of all time zone names with <code><a href="https://rdrr.io/r/base/timezones.html">OlsonNames()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">length(OlsonNames())
|
||||
<pre data-type="programlisting" data-code-language="r">length(OlsonNames())
|
||||
#> [1] 595
|
||||
head(OlsonNames())
|
||||
#> [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
|
||||
@@ -698,7 +698,7 @@ head(OlsonNames())
|
||||
</div>
|
||||
<p>In R, the time zone is an attribute of the date-time that only controls printing. For example, these three objects represent the same instant in time:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">x1 <- ymd_hms("2024-06-01 12:00:00", tz = "America/New_York")
|
||||
<pre data-type="programlisting" data-code-language="r">x1 <- ymd_hms("2024-06-01 12:00:00", tz = "America/New_York")
|
||||
x1
|
||||
#> [1] "2024-06-01 12:00:00 EDT"
|
||||
|
||||
@@ -712,14 +712,14 @@ x3
|
||||
</div>
|
||||
<p>You can verify that they’re the same time using subtraction:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">x1 - x2
|
||||
<pre data-type="programlisting" data-code-language="r">x1 - x2
|
||||
#> Time difference of 0 secs
|
||||
x1 - x3
|
||||
#> Time difference of 0 secs</pre>
|
||||
</div>
|
||||
<p>Unless otherwise specified, lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and is roughly equivalent to GMT (Greenwich Mean Time). It does not have DST, which makes a convenient representation for computation. Operations that combine date-times, like <code><a href="https://rdrr.io/r/base/c.html">c()</a></code>, will often drop the time zone. In that case, the date-times will display in your local time zone:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">x4 <- c(x1, x2, x3)
|
||||
<pre data-type="programlisting" data-code-language="r">x4 <- c(x1, x2, x3)
|
||||
x4
|
||||
#> [1] "2024-06-01 12:00:00 EDT" "2024-06-01 12:00:00 EDT"
|
||||
#> [3] "2024-06-01 12:00:00 EDT"</pre>
|
||||
@@ -728,7 +728,7 @@ x4
|
||||
<ul><li>
|
||||
<p>Keep the instant in time the same, and change how it’s displayed. Use this when the instant is correct, but you want a more natural display.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">x4a <- with_tz(x4, tzone = "Australia/Lord_Howe")
|
||||
<pre data-type="programlisting" data-code-language="r">x4a <- with_tz(x4, tzone = "Australia/Lord_Howe")
|
||||
x4a
|
||||
#> [1] "2024-06-02 02:30:00 +1030" "2024-06-02 02:30:00 +1030"
|
||||
#> [3] "2024-06-02 02:30:00 +1030"
|
||||
@@ -741,7 +741,7 @@ x4a - x4
|
||||
<li>
|
||||
<p>Change the underlying instant in time. Use this when you have an instant that has been labelled with the incorrect time zone, and you need to fix it.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">x4b <- force_tz(x4, tzone = "Australia/Lord_Howe")
|
||||
<pre data-type="programlisting" data-code-language="r">x4b <- force_tz(x4, tzone = "Australia/Lord_Howe")
|
||||
x4b
|
||||
#> [1] "2024-06-01 12:00:00 +1030" "2024-06-01 12:00:00 +1030"
|
||||
#> [3] "2024-06-01 12:00:00 +1030"
|
||||
|
||||
Reference in New Issue
Block a user