Re-render book for O'Reilly
This commit is contained in:
@@ -138,7 +138,7 @@ General Social Survey</h1>
|
||||
</div>
|
||||
<p>Or with a bar chart:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(gss_cat, aes(race)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(gss_cat, aes(x = race)) +
|
||||
geom_bar()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid" alt="A bar chart showing the distribution of race. There are ~2000 records with race "Other", 3000 with race "Black", and other 15,000 with race "White"." width="576"/></p>
|
||||
@@ -162,13 +162,13 @@ Modifying factor order</h1>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">relig_summary <- gss_cat |>
|
||||
group_by(relig) |>
|
||||
summarise(
|
||||
summarize(
|
||||
age = mean(age, na.rm = TRUE),
|
||||
tvhours = mean(tvhours, na.rm = TRUE),
|
||||
n = n()
|
||||
)
|
||||
|
||||
ggplot(relig_summary, aes(tvhours, relig)) +
|
||||
ggplot(relig_summary, aes(x = tvhours, y = relig)) +
|
||||
geom_point()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid" alt="A scatterplot of with tvhours on the x-axis and religion on the y-axis. The y-axis is ordered seemingly aribtrarily making it hard to get any sense of overall pattern." width="576"/></p>
|
||||
@@ -181,7 +181,7 @@ ggplot(relig_summary, aes(tvhours, relig)) +
|
||||
<code>x</code>, a numeric vector that you want to use to reorder the levels.</li>
|
||||
<li>Optionally, <code>fun</code>, a function that’s used if there are multiple values of <code>x</code> for each value of <code>f</code>. The default value is <code>median</code>.</li>
|
||||
</ul><div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(relig_summary, aes(x = tvhours, y = fct_reorder(relig, tvhours))) +
|
||||
geom_point()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid" alt="The same scatterplot as above, but now the religion is displayed in increasing order of tvhours. "Other eastern" has the fewest tvhours under 2, and "Don't know" has the highest (over 5)." width="576"/></p>
|
||||
@@ -194,20 +194,20 @@ ggplot(relig_summary, aes(tvhours, relig)) +
|
||||
mutate(
|
||||
relig = fct_reorder(relig, tvhours)
|
||||
) |>
|
||||
ggplot(aes(tvhours, relig)) +
|
||||
ggplot(aes(x = tvhours, y = relig)) +
|
||||
geom_point()</pre>
|
||||
</div>
|
||||
<p>What if we create a similar plot looking at how average age varies across reported income level?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">rincome_summary <- gss_cat |>
|
||||
group_by(rincome) |>
|
||||
summarise(
|
||||
summarize(
|
||||
age = mean(age, na.rm = TRUE),
|
||||
tvhours = mean(tvhours, na.rm = TRUE),
|
||||
n = n()
|
||||
)
|
||||
|
||||
ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
|
||||
ggplot(rincome_summary, aes(x = age, y = fct_reorder(rincome, age))) +
|
||||
geom_point()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid" alt="A scatterplot with age on the x-axis and income on the y-axis. Income has been reordered in order of average age which doesn't make much sense. One section of the y-axis goes from $6000-6999, then <$1000, then $8000-9999." width="576"/></p>
|
||||
@@ -216,7 +216,7 @@ ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
|
||||
<p>Here, arbitrarily reordering the levels isn’t a good idea! That’s because <code>rincome</code> already has a principled order that we shouldn’t mess with. Reserve <code><a href="https://forcats.tidyverse.org/reference/fct_reorder.html">fct_reorder()</a></code> for factors whose levels are arbitrarily ordered.</p>
|
||||
<p>However, it does make sense to pull “Not applicable” to the front with the other special levels. You can use <code><a href="https://forcats.tidyverse.org/reference/fct_relevel.html">fct_relevel()</a></code>. It takes a factor, <code>f</code>, and then any number of levels that you want to move to the front of the line.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(rincome_summary, aes(x = age, y = fct_relevel(rincome, "Not applicable"))) +
|
||||
geom_point()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-21-1.png" class="img-fluid" alt="The same scatterplot but now "Not Applicable" is displayed at the bottom of the y-axis. Generally there is a positive association between income and age, and the income band with the highest average age is "Not applicable"." width="576"/></p>
|
||||
@@ -227,7 +227,7 @@ ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="r">#|
|
||||
#| Rearranging the legend makes the plot easier to read because the
|
||||
#| legend colours now match the order of the lines on the far right
|
||||
#| legend colors now match the order of the lines on the far right
|
||||
#| of the plot. You can see some unsuprising patterns: the proportion
|
||||
#| never marred decreases with age, married forms an upside down U
|
||||
#| shape, and widowed starts off low but increases steeply after age
|
||||
@@ -240,12 +240,12 @@ by_age <- gss_cat |>
|
||||
prop = n / sum(n)
|
||||
)
|
||||
|
||||
ggplot(by_age, aes(age, prop, colour = marital)) +
|
||||
ggplot(by_age, aes(x = age, y = prop, color = marital)) +
|
||||
geom_line(na.rm = TRUE)
|
||||
|
||||
ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
|
||||
ggplot(by_age, aes(x = age, y = prop, color = fct_reorder2(marital, age, prop))) +
|
||||
geom_line() +
|
||||
labs(colour = "marital")</pre>
|
||||
labs(color = "marital")</pre>
|
||||
<div class="cell quarto-layout-panel">
|
||||
<div class="quarto-layout-row quarto-layout-valign-top">
|
||||
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
|
||||
@@ -261,7 +261,7 @@ ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="r">gss_cat |>
|
||||
mutate(marital = marital |> fct_infreq() |> fct_rev()) |>
|
||||
ggplot(aes(marital)) +
|
||||
ggplot(aes(x = marital)) +
|
||||
geom_bar()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-23-1.png" class="img-fluid" alt="A bar char of marital status ordered in from least to most common: no answer (~0), separated (~1,000), widowed (~2,000), divorced (~3,000), never married (~5,000), married (~10,000)." width="576"/></p>
|
||||
|
||||
Reference in New Issue
Block a user