Re-render book for O'Reilly

This commit is contained in:
Hadley Wickham
2023-01-12 17:22:57 -06:00
parent 28671ed8bd
commit 360d65ae47
113 changed files with 4957 additions and 2997 deletions

View File

@@ -138,7 +138,7 @@ General Social Survey</h1>
</div>
<p>Or with a bar chart:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(gss_cat, aes(race)) +
<pre data-type="programlisting" data-code-language="r">ggplot(gss_cat, aes(x = race)) +
geom_bar()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid" alt="A bar chart showing the distribution of race. There are ~2000 records with race &quot;Other&quot;, 3000 with race &quot;Black&quot;, and other 15,000 with race &quot;White&quot;." width="576"/></p>
@@ -162,13 +162,13 @@ Modifying factor order</h1>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">relig_summary &lt;- gss_cat |&gt;
group_by(relig) |&gt;
summarise(
summarize(
age = mean(age, na.rm = TRUE),
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(relig_summary, aes(tvhours, relig)) +
ggplot(relig_summary, aes(x = tvhours, y = relig)) +
geom_point()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid" alt="A scatterplot of with tvhours on the x-axis and religion on the y-axis. The y-axis is ordered seemingly aribtrarily making it hard to get any sense of overall pattern." width="576"/></p>
@@ -181,7 +181,7 @@ ggplot(relig_summary, aes(tvhours, relig)) +
<code>x</code>, a numeric vector that you want to use to reorder the levels.</li>
<li>Optionally, <code>fun</code>, a function thats used if there are multiple values of <code>x</code> for each value of <code>f</code>. The default value is <code>median</code>.</li>
</ul><div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
<pre data-type="programlisting" data-code-language="r">ggplot(relig_summary, aes(x = tvhours, y = fct_reorder(relig, tvhours))) +
geom_point()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid" alt="The same scatterplot as above, but now the religion is displayed in increasing order of tvhours. &quot;Other eastern&quot; has the fewest tvhours under 2, and &quot;Don't know&quot; has the highest (over 5)." width="576"/></p>
@@ -194,20 +194,20 @@ ggplot(relig_summary, aes(tvhours, relig)) +
mutate(
relig = fct_reorder(relig, tvhours)
) |&gt;
ggplot(aes(tvhours, relig)) +
ggplot(aes(x = tvhours, y = relig)) +
geom_point()</pre>
</div>
<p>What if we create a similar plot looking at how average age varies across reported income level?</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">rincome_summary &lt;- gss_cat |&gt;
group_by(rincome) |&gt;
summarise(
summarize(
age = mean(age, na.rm = TRUE),
tvhours = mean(tvhours, na.rm = TRUE),
n = n()
)
ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
ggplot(rincome_summary, aes(x = age, y = fct_reorder(rincome, age))) +
geom_point()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid" alt="A scatterplot with age on the x-axis and income on the y-axis. Income has been reordered in order of average age which doesn't make much sense. One section of the y-axis goes from $6000-6999, then &lt;$1000, then $8000-9999." width="576"/></p>
@@ -216,7 +216,7 @@ ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
<p>Here, arbitrarily reordering the levels isnt a good idea! Thats because <code>rincome</code> already has a principled order that we shouldnt mess with. Reserve <code><a href="https://forcats.tidyverse.org/reference/fct_reorder.html">fct_reorder()</a></code> for factors whose levels are arbitrarily ordered.</p>
<p>However, it does make sense to pull “Not applicable” to the front with the other special levels. You can use <code><a href="https://forcats.tidyverse.org/reference/fct_relevel.html">fct_relevel()</a></code>. It takes a factor, <code>f</code>, and then any number of levels that you want to move to the front of the line.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) +
<pre data-type="programlisting" data-code-language="r">ggplot(rincome_summary, aes(x = age, y = fct_relevel(rincome, "Not applicable"))) +
geom_point()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid" alt="The same scatterplot but now &quot;Not Applicable&quot; is displayed at the bottom of the y-axis. Generally there is a positive association between income and age, and the income band with the highest average age is &quot;Not applicable&quot;." width="576"/></p>
@@ -227,7 +227,7 @@ ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
<div>
<pre data-type="programlisting" data-code-language="r">#|
#| Rearranging the legend makes the plot easier to read because the
#| legend colours now match the order of the lines on the far right
#| legend colors now match the order of the lines on the far right
#| of the plot. You can see some unsuprising patterns: the proportion
#| never marred decreases with age, married forms an upside down U
#| shape, and widowed starts off low but increases steeply after age
@@ -240,12 +240,12 @@ by_age &lt;- gss_cat |&gt;
prop = n / sum(n)
)
ggplot(by_age, aes(age, prop, colour = marital)) +
ggplot(by_age, aes(x = age, y = prop, color = marital)) +
geom_line(na.rm = TRUE)
ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
ggplot(by_age, aes(x = age, y = prop, color = fct_reorder2(marital, age, prop))) +
geom_line() +
labs(colour = "marital")</pre>
labs(color = "marital")</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
@@ -261,7 +261,7 @@ ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
<div class="cell">
<pre data-type="programlisting" data-code-language="r">gss_cat |&gt;
mutate(marital = marital |&gt; fct_infreq() |&gt; fct_rev()) |&gt;
ggplot(aes(marital)) +
ggplot(aes(x = marital)) +
geom_bar()</pre>
<div class="cell-output-display">
<p><img src="factors_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid" alt="A bar char of marital status ordered in from least to most common: no answer (~0), separated (~1,000), widowed (~2,000), divorced (~3,000), never married (~5,000), married (~10,000)." width="576"/></p>