More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,6 +1,6 @@
<section data-type="chapter" id="chp-datetimes">
<h1><span id="sec-dates-and-times" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Dates and times</span></span></h1>
<section id="introduction" data-type="sect1">
<section id="datetimes-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>This chapter will show you how to work with dates and times in R. At first glance, dates and times seem simple. You use them all the time in your regular life, and they dont seem to cause much confusion. However, the more you learn about dates and times, the more complicated they seem to get!</p>
@@ -8,14 +8,12 @@ Introduction</h1>
<p>Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and DST. This chapter wont teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges.</p>
<p>Well begin by showing you how to create date-times from various inputs, and then once youve got a date-time, how you can extract components like year, month, and day. Well then dive into the tricky topic of working with time spans, which come in a variety of flavors depending on what youre trying to do. Well conclude with a brief discussion of the additional challenges posed by time zones.</p>
<section id="prerequisites" data-type="sect2">
<section id="datetimes-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<p>This chapter will focus on the <strong>lubridate</strong> package, which makes it easier to work with dates and times in R. lubridate is not part of core tidyverse because you only need it when youre working with dates/times. We will also need nycflights13 for practice data.</p>
<p>This chapter will focus on the <strong>lubridate</strong> package, which makes it easier to work with dates and times in R. As of the latest tidyverse release, lubridate is part of core tidyverse so. We will also need nycflights13 for practice data.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
library(lubridate)
library(nycflights13)</pre>
</div>
</section>
@@ -33,9 +31,9 @@ Creating date/times</h1>
<p>To get the current date or date-time you can use <code><a href="https://lubridate.tidyverse.org/reference/now.html">today()</a></code> or <code><a href="https://lubridate.tidyverse.org/reference/now.html">now()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">today()
#&gt; [1] "2023-01-12"
#&gt; [1] "2023-01-26"
now()
#&gt; [1] "2023-01-12 17:04:08 CST"</pre>
#&gt; [1] "2023-01-26 10:32:54 CST"</pre>
</div>
<p>Otherwise, the following sections describe the four ways youre likely to create a date/time:</p>
<ul><li>While reading a file with readr.</li>
@@ -281,9 +279,9 @@ From other types</h2>
<p>You may want to switch between a date-time and a date. Thats the job of <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_datetime()</a></code> and <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_date()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">as_datetime(today())
#&gt; [1] "2023-01-12 UTC"
#&gt; [1] "2023-01-26 UTC"
as_date(now())
#&gt; [1] "2023-01-12"</pre>
#&gt; [1] "2023-01-26"</pre>
</div>
<p>Sometimes youll get date/times as numeric offsets from the “Unix Epoch”, 1970-01-01. If the offset is in seconds, use <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_datetime()</a></code>; if its in days, use <code><a href="https://lubridate.tidyverse.org/reference/as_date.html">as_date()</a></code>.</p>
<div class="cell">
@@ -294,7 +292,7 @@ as_date(365 * 10 + 2)
</div>
</section>
<section id="exercises" data-type="sect2">
<section id="datetimes-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>
@@ -474,7 +472,7 @@ update(ymd("2023-02-01"), hour = 400)
</div>
</section>
<section id="exercises-1" data-type="sect2">
<section id="datetimes-exercises-1" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>How does the distribution of flight times within a day change over the course of the year?</p></li>
@@ -507,12 +505,12 @@ Durations</h2>
<pre data-type="programlisting" data-code-language="r"># How old is Hadley?
h_age &lt;- today() - ymd("1979-10-14")
h_age
#&gt; Time difference of 15796 days</pre>
#&gt; Time difference of 15810 days</pre>
</div>
<p>A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the <strong>duration</strong>.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">as.duration(h_age)
#&gt; [1] "1364774400s (~43.25 years)"</pre>
#&gt; [1] "1365984000s (~43.29 years)"</pre>
</div>
<p>Durations come with a bunch of convenient constructors:</p>
<div class="cell">
@@ -530,7 +528,7 @@ dweeks(3)
dyears(1)
#&gt; [1] "31557600s (~1 years)"</pre>
</div>
<p>Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week. Larger time units are more problematic. A year is uses the “average” number of days in a year, i.e. 365.25. Theres no way to convert a month to a duration, because theres just too much variation.</p>
<p>Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week. Larger time units are more problematic. A year uses the “average” number of days in a year, i.e. 365.25. Theres no way to convert a month to a duration, because theres just too much variation.</p>
<p>You can add and multiply durations:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">2 * dyears(1)
@@ -545,14 +543,14 @@ last_year &lt;- today() - dyears(1)</pre>
</div>
<p>However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">one_pm &lt;- ymd_hms("2026-03-12 13:00:00", tz = "America/New_York")
<pre data-type="programlisting" data-code-language="r">one_am &lt;- ymd_hms("2026-03-08 01:00:00", tz = "America/New_York")
one_pm
#&gt; [1] "2026-03-12 13:00:00 EDT"
one_pm + ddays(1)
#&gt; [1] "2026-03-13 13:00:00 EDT"</pre>
one_am
#&gt; [1] "2026-03-08 01:00:00 EST"
one_am + ddays(1)
#&gt; [1] "2026-03-09 02:00:00 EDT"</pre>
</div>
<p>Why is one day after 1pm March 12, 2pm March 13? If you look carefully at the date you might also notice that the time zones have changed. March 12 only has 23 hours because its when DST starts, so if we add a full days worth of seconds we end up with a different time.</p>
<p>Why is one day after 1am March 8, 2am March 9? If you look carefully at the date you might also notice that the time zones have changed. March 8 only has 23 hours because its when DST starts, so if we add a full days worth of seconds we end up with a different time.</p>
</section>
<section id="periods" data-type="sect2">
@@ -560,10 +558,10 @@ one_pm + ddays(1)
Periods</h2>
<p>To solve this problem, lubridate provides <strong>periods</strong>. Periods are time spans but dont have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them to work in a more intuitive way:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">one_pm
#&gt; [1] "2026-03-12 13:00:00 EDT"
one_pm + days(1)
#&gt; [1] "2026-03-13 13:00:00 EDT"</pre>
<pre data-type="programlisting" data-code-language="r">one_am
#&gt; [1] "2026-03-08 01:00:00 EST"
one_am + days(1)
#&gt; [1] "2026-03-09 01:00:00 EDT"</pre>
</div>
<p>Like durations, periods can be created with a number of friendly constructor functions.</p>
<div class="cell">
@@ -591,10 +589,10 @@ ymd("2024-01-01") + years(1)
#&gt; [1] "2025-01-01"
# Daylight Savings Time
one_pm + ddays(1)
#&gt; [1] "2026-03-13 13:00:00 EDT"
one_pm + days(1)
#&gt; [1] "2026-03-13 13:00:00 EDT"</pre>
one_am + ddays(1)
#&gt; [1] "2026-03-09 02:00:00 EDT"
one_am + days(1)
#&gt; [1] "2026-03-09 01:00:00 EDT"</pre>
</div>
<p>Lets use periods to fix an oddity related to our flight dates. Some planes appear to have arrived at their destination <em>before</em> they departed from New York City.</p>
<div class="cell">
@@ -668,7 +666,7 @@ y2024 / days(1)
</div>
</section>
<section id="exercises-2" data-type="sect2">
<section id="datetimes-exercises-2" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>Explain <code>days(overnight * 1)</code> to someone who has just started learning R. How does it work?</p></li>
@@ -694,7 +692,7 @@ Time zones</h1>
<p>And see the complete list of all time zone names with <code><a href="https://rdrr.io/r/base/timezones.html">OlsonNames()</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">length(OlsonNames())
#&gt; [1] 596
#&gt; [1] 597
head(OlsonNames())
#&gt; [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
#&gt; [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"</pre>
@@ -755,7 +753,7 @@ x4b - x4
</li>
</ul></section>
<section id="summary" data-type="sect1">
<section id="datetimes-summary" data-type="sect1">
<h1>
Summary</h1>
<p>This chapter has introduced you to the tools that lubridate provides to help you work with date-time data. Working with dates and times can seem harder than necessary, but hopefully this chapter has helped you see why — date-times are more complex than they seem at first glance, and handling every possible situation adds complexity. Even if your data never crosses a day light savings boundary or involves a leap year, the functions need to be able to handle it.</p>