Don't transform non-crossref links
This commit is contained in:
@@ -12,7 +12,7 @@
|
||||
<h1>
|
||||
Introduction</h1>
|
||||
<p>Factors are used for categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order.</p>
|
||||
<p>We’ll start by motivating why factors are needed for data analysis and how you can create them with <code><a href="#chp-https://rdrr.io/r/base/factor" data-type="xref">#chp-https://rdrr.io/r/base/factor</a></code>. We’ll then introduce you to the <code>gss_cat</code> dataset which contains a bunch of categorical variables to experiment with. You’ll then use that dataset to practice modifying the order and values of factors, before we finish up with a discussion of ordered factors.</p>
|
||||
<p>We’ll start by motivating why factors are needed for data analysis and how you can create them with <code><a href="https://rdrr.io/r/base/factor.html">factor()</a></code>. We’ll then introduce you to the <code>gss_cat</code> dataset which contains a bunch of categorical variables to experiment with. You’ll then use that dataset to practice modifying the order and values of factors, before we finish up with a discussion of ordered factors.</p>
|
||||
|
||||
<section id="prerequisites" data-type="sect2">
|
||||
<h2>
|
||||
@@ -70,7 +70,7 @@ y2
|
||||
#> [1] Dec Apr <NA> Mar
|
||||
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec</pre>
|
||||
</div>
|
||||
<p>This seems risky, so you might want to use <code><a href="#chp-https://forcats.tidyverse.org/reference/fct" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct</a></code> instead:</p>
|
||||
<p>This seems risky, so you might want to use <code><a href="https://forcats.tidyverse.org/reference/fct.html">fct()</a></code> instead:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">y2 <- fct(x2, levels = month_levels)
|
||||
#> Error in `fct()`:
|
||||
@@ -83,7 +83,7 @@ y2
|
||||
#> [1] Dec Apr Jan Mar
|
||||
#> Levels: Apr Dec Jan Mar</pre>
|
||||
</div>
|
||||
<p>Sometimes you’d prefer that the order of the levels matches the order of the first appearance in the data. You can do that when creating the factor by setting levels to <code>unique(x)</code>, or after the fact, with <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_inorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_inorder</a></code>:</p>
|
||||
<p>Sometimes you’d prefer that the order of the levels matches the order of the first appearance in the data. You can do that when creating the factor by setting levels to <code>unique(x)</code>, or after the fact, with <code><a href="https://forcats.tidyverse.org/reference/fct_inorder.html">fct_inorder()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">f1 <- factor(x1, levels = unique(x1))
|
||||
f1
|
||||
@@ -95,12 +95,12 @@ f2
|
||||
#> [1] Dec Apr Jan Mar
|
||||
#> Levels: Dec Apr Jan Mar</pre>
|
||||
</div>
|
||||
<p>If you ever need to access the set of valid levels directly, you can do so with <code><a href="#chp-https://rdrr.io/r/base/levels" data-type="xref">#chp-https://rdrr.io/r/base/levels</a></code>:</p>
|
||||
<p>If you ever need to access the set of valid levels directly, you can do so with <code><a href="https://rdrr.io/r/base/levels.html">levels()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">levels(f2)
|
||||
#> [1] "Dec" "Apr" "Jan" "Mar"</pre>
|
||||
</div>
|
||||
<p>You can also create a factor when reading your data with readr with <code><a href="#chp-https://readr.tidyverse.org/reference/parse_factor" data-type="xref">#chp-https://readr.tidyverse.org/reference/parse_factor</a></code>:</p>
|
||||
<p>You can also create a factor when reading your data with readr with <code><a href="https://readr.tidyverse.org/reference/parse_factor.html">col_factor()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">csv <- "
|
||||
month,value
|
||||
@@ -118,7 +118,7 @@ df$month
|
||||
<section id="general-social-survey" data-type="sect1">
|
||||
<h1>
|
||||
General Social Survey</h1>
|
||||
<p>For the rest of this chapter, we’re going to use <code><a href="#chp-https://forcats.tidyverse.org/reference/gss_cat" data-type="xref">#chp-https://forcats.tidyverse.org/reference/gss_cat</a></code>. It’s a sample of data from the <a href="#chp-https://gss.norc" data-type="xref">#chp-https://gss.norc</a>, a long-running US survey conducted by the independent research organization NORC at the University of Chicago. The survey has thousands of questions, so in <code>gss_cat</code> Hadley selected a handful that will illustrate some common challenges you’ll encounter when working with factors.</p>
|
||||
<p>For the rest of this chapter, we’re going to use <code><a href="https://forcats.tidyverse.org/reference/gss_cat.html">forcats::gss_cat</a></code>. It’s a sample of data from the <a href="https://gss.norc.org">General Social Survey</a>, a long-running US survey conducted by the independent research organization NORC at the University of Chicago. The survey has thousands of questions, so in <code>gss_cat</code> Hadley selected a handful that will illustrate some common challenges you’ll encounter when working with factors.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat
|
||||
#> # A tibble: 21,483 × 9
|
||||
@@ -132,8 +132,8 @@ General Social Survey</h1>
|
||||
#> 6 2000 Married 25 White $20000 - 24999 Strong dem… Prot… Sout… NA
|
||||
#> # … with 21,477 more rows</pre>
|
||||
</div>
|
||||
<p>(Remember, since this dataset is provided by a package, you can get more information about the variables with <code><a href="#chp-https://forcats.tidyverse.org/reference/gss_cat" data-type="xref">#chp-https://forcats.tidyverse.org/reference/gss_cat</a></code>.)</p>
|
||||
<p>When factors are stored in a tibble, you can’t see their levels so easily. One way to view them is with <code><a href="#chp-https://dplyr.tidyverse.org/reference/count" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/count</a></code>:</p>
|
||||
<p>(Remember, since this dataset is provided by a package, you can get more information about the variables with <code><a href="https://forcats.tidyverse.org/reference/gss_cat.html">?gss_cat</a></code>.)</p>
|
||||
<p>When factors are stored in a tibble, you can’t see their levels so easily. One way to view them is with <code><a href="https://dplyr.tidyverse.org/reference/count.html">count()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
count(race)
|
||||
@@ -182,7 +182,7 @@ ggplot(relig_summary, aes(tvhours, relig)) +
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid" alt="A scatterplot of with tvhours on the x-axis and religion on the y-axis. The y-axis is ordered seemingly aribtrarily making it hard to get any sense of overall pattern." width="576"/></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>It is hard to read this plot because there’s no overall pattern. We can improve it by reordering the levels of <code>relig</code> using <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_reorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_reorder</a></code>. <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_reorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_reorder</a></code> takes three arguments:</p>
|
||||
<p>It is hard to read this plot because there’s no overall pattern. We can improve it by reordering the levels of <code>relig</code> using <code><a href="https://forcats.tidyverse.org/reference/fct_reorder.html">fct_reorder()</a></code>. <code><a href="https://forcats.tidyverse.org/reference/fct_reorder.html">fct_reorder()</a></code> takes three arguments:</p>
|
||||
<ul><li>
|
||||
<code>f</code>, the factor whose levels you want to modify.</li>
|
||||
<li>
|
||||
@@ -196,7 +196,7 @@ ggplot(relig_summary, aes(tvhours, relig)) +
|
||||
</div>
|
||||
</div>
|
||||
<p>Reordering religion makes it much easier to see that people in the “Don’t know” category watch much more TV, and Hinduism & Other Eastern religions watch much less.</p>
|
||||
<p>As you start making more complicated transformations, we recommend moving them out of <code><a href="#chp-https://ggplot2.tidyverse.org/reference/aes" data-type="xref">#chp-https://ggplot2.tidyverse.org/reference/aes</a></code> and into a separate <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> step. For example, you could rewrite the plot above as:</p>
|
||||
<p>As you start making more complicated transformations, we recommend moving them out of <code><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes()</a></code> and into a separate <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> step. For example, you could rewrite the plot above as:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">relig_summary |>
|
||||
mutate(
|
||||
@@ -221,8 +221,8 @@ ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) +
|
||||
<p><img src="factors_files/figure-html/unnamed-chunk-20-1.png" class="img-fluid" alt="A scatterplot with age on the x-axis and income on the y-axis. Income has been reordered in order of average age which doesn't make much sense. One section of the y-axis goes from $6000-6999, then <$1000, then $8000-9999." width="576"/></p>
|
||||
</div>
|
||||
</div>
|
||||
<p>Here, arbitrarily reordering the levels isn’t a good idea! That’s because <code>rincome</code> already has a principled order that we shouldn’t mess with. Reserve <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_reorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_reorder</a></code> for factors whose levels are arbitrarily ordered.</p>
|
||||
<p>However, it does make sense to pull “Not applicable” to the front with the other special levels. You can use <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_relevel" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_relevel</a></code>. It takes a factor, <code>f</code>, and then any number of levels that you want to move to the front of the line.</p>
|
||||
<p>Here, arbitrarily reordering the levels isn’t a good idea! That’s because <code>rincome</code> already has a principled order that we shouldn’t mess with. Reserve <code><a href="https://forcats.tidyverse.org/reference/fct_reorder.html">fct_reorder()</a></code> for factors whose levels are arbitrarily ordered.</p>
|
||||
<p>However, it does make sense to pull “Not applicable” to the front with the other special levels. You can use <code><a href="https://forcats.tidyverse.org/reference/fct_relevel.html">fct_relevel()</a></code>. It takes a factor, <code>f</code>, and then any number of levels that you want to move to the front of the line.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) +
|
||||
geom_point()</pre>
|
||||
@@ -265,7 +265,7 @@ ggplot(by_age, aes(age, prop, colour = fct_reorder2(marital, age, prop))) +
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>Finally, for bar plots, you can use <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_inorder" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_inorder</a></code> to order levels in decreasing frequency: this is the simplest type of reordering because it doesn’t need any extra variables. Combine it with <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_rev" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_rev</a></code> if you want them in increasing frequency so that in the bar plot largest values are on the right, not the left.</p>
|
||||
<p>Finally, for bar plots, you can use <code><a href="https://forcats.tidyverse.org/reference/fct_inorder.html">fct_infreq()</a></code> to order levels in decreasing frequency: this is the simplest type of reordering because it doesn’t need any extra variables. Combine it with <code><a href="https://forcats.tidyverse.org/reference/fct_rev.html">fct_rev()</a></code> if you want them in increasing frequency so that in the bar plot largest values are on the right, not the left.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
mutate(marital = marital |> fct_infreq() |> fct_rev()) |>
|
||||
@@ -288,7 +288,7 @@ Exercises</h2>
|
||||
<section id="modifying-factor-levels" data-type="sect1">
|
||||
<h1>
|
||||
Modifying factor levels</h1>
|
||||
<p>More powerful than changing the orders of the levels is changing their values. This allows you to clarify labels for publication, and collapse levels for high-level displays. The most general and powerful tool is <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_recode" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_recode</a></code>. It allows you to recode, or change, the value of each level. For example, take the <code>gss_cat$partyid</code>:</p>
|
||||
<p>More powerful than changing the orders of the levels is changing their values. This allows you to clarify labels for publication, and collapse levels for high-level displays. The most general and powerful tool is <code><a href="https://forcats.tidyverse.org/reference/fct_recode.html">fct_recode()</a></code>. It allows you to recode, or change, the value of each level. For example, take the <code>gss_cat$partyid</code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |> count(partyid)
|
||||
#> # A tibble: 10 × 2
|
||||
@@ -327,7 +327,7 @@ Modifying factor levels</h1>
|
||||
#> 6 Independent, near rep 1791
|
||||
#> # … with 4 more rows</pre>
|
||||
</div>
|
||||
<p><code><a href="#chp-https://forcats.tidyverse.org/reference/fct_recode" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_recode</a></code> will leave the levels that aren’t explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn’t exist.</p>
|
||||
<p><code><a href="https://forcats.tidyverse.org/reference/fct_recode.html">fct_recode()</a></code> will leave the levels that aren’t explicitly mentioned as is, and will warn you if you accidentally refer to a level that doesn’t exist.</p>
|
||||
<p>To combine groups, you can assign multiple old levels to the same new level:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
@@ -357,7 +357,7 @@ Modifying factor levels</h1>
|
||||
#> # … with 2 more rows</pre>
|
||||
</div>
|
||||
<p>Use this technique with care: if you group together categories that are truly different you will end up with misleading results.</p>
|
||||
<p>If you want to collapse a lot of levels, <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_collapse" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_collapse</a></code> is a useful variant of <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_recode" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_recode</a></code>. For each new variable, you can provide a vector of old levels:</p>
|
||||
<p>If you want to collapse a lot of levels, <code><a href="https://forcats.tidyverse.org/reference/fct_collapse.html">fct_collapse()</a></code> is a useful variant of <code><a href="https://forcats.tidyverse.org/reference/fct_recode.html">fct_recode()</a></code>. For each new variable, you can provide a vector of old levels:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
mutate(
|
||||
@@ -377,7 +377,7 @@ Modifying factor levels</h1>
|
||||
#> 3 ind 8409
|
||||
#> 4 dem 7180</pre>
|
||||
</div>
|
||||
<p>Sometimes you just want to lump together the small groups to make a plot or table simpler. That’s the job of the <code>fct_lump_*()</code> family of functions. <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_lump" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_lump</a></code> is a simple starting point that progressively lumps the smallest groups categories into “Other”, always keeping “Other” as the smallest category.</p>
|
||||
<p>Sometimes you just want to lump together the small groups to make a plot or table simpler. That’s the job of the <code>fct_lump_*()</code> family of functions. <code><a href="https://forcats.tidyverse.org/reference/fct_lump.html">fct_lump_lowfreq()</a></code> is a simple starting point that progressively lumps the smallest groups categories into “Other”, always keeping “Other” as the smallest category.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
mutate(relig = fct_lump_lowfreq(relig)) |>
|
||||
@@ -388,7 +388,7 @@ Modifying factor levels</h1>
|
||||
#> 1 Protestant 10846
|
||||
#> 2 Other 10637</pre>
|
||||
</div>
|
||||
<p>In this case it’s not very helpful: it is true that the majority of Americans in this survey are Protestant, but we’d probably like to see some more details! Instead, we can use the <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_lump" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_lump</a></code> to specify that we want exactly 10 groups:</p>
|
||||
<p>In this case it’s not very helpful: it is true that the majority of Americans in this survey are Protestant, but we’d probably like to see some more details! Instead, we can use the <code><a href="https://forcats.tidyverse.org/reference/fct_lump.html">fct_lump_n()</a></code> to specify that we want exactly 10 groups:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">gss_cat |>
|
||||
mutate(relig = fct_lump_n(relig, n = 10)) |>
|
||||
@@ -408,27 +408,27 @@ Modifying factor levels</h1>
|
||||
#> 9 Moslem/islam 104
|
||||
#> 10 Orthodox-christian 95</pre>
|
||||
</div>
|
||||
<p>Read the documentation to learn about <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_lump" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_lump</a></code> and <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_lump" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_lump</a></code> which are useful in other cases.</p>
|
||||
<p>Read the documentation to learn about <code><a href="https://forcats.tidyverse.org/reference/fct_lump.html">fct_lump_min()</a></code> and <code><a href="https://forcats.tidyverse.org/reference/fct_lump.html">fct_lump_prop()</a></code> which are useful in other cases.</p>
|
||||
|
||||
<section id="exercises-1" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li><p>How have the proportions of people identifying as Democrat, Republican, and Independent changed over time?</p></li>
|
||||
<li><p>How could you collapse <code>rincome</code> into a small set of categories?</p></li>
|
||||
<li><p>Notice there are 9 groups (excluding other) in the <code>fct_lump</code> example above. Why not 10? (Hint: type <code><a href="#chp-https://forcats.tidyverse.org/reference/fct_lump" data-type="xref">#chp-https://forcats.tidyverse.org/reference/fct_lump</a></code>, and find the default for the argument <code>other_level</code> is “Other”.)</p></li>
|
||||
<li><p>Notice there are 9 groups (excluding other) in the <code>fct_lump</code> example above. Why not 10? (Hint: type <code><a href="https://forcats.tidyverse.org/reference/fct_lump.html">?fct_lump</a></code>, and find the default for the argument <code>other_level</code> is “Other”.)</p></li>
|
||||
</ol></section>
|
||||
</section>
|
||||
|
||||
<section id="ordered-factors" data-type="sect1">
|
||||
<h1>
|
||||
Ordered factors</h1>
|
||||
<p>Before we go on, there’s a special type of factor that needs to be mentioned briefly: ordered factors. Ordered factors, created with <code><a href="#chp-https://rdrr.io/r/base/factor" data-type="xref">#chp-https://rdrr.io/r/base/factor</a></code>, imply a strict ordering and equal distance between levels: the first level is “less than” the second level by the same amount that the second level is “less than” the third level, and so on.. You can recognize them when printing because they use <code><</code> between the factor levels:</p>
|
||||
<p>Before we go on, there’s a special type of factor that needs to be mentioned briefly: ordered factors. Ordered factors, created with <code><a href="https://rdrr.io/r/base/factor.html">ordered()</a></code>, imply a strict ordering and equal distance between levels: the first level is “less than” the second level by the same amount that the second level is “less than” the third level, and so on.. You can recognize them when printing because they use <code><</code> between the factor levels:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ordered(c("a", "b", "c"))
|
||||
#> [1] a b c
|
||||
#> Levels: a < b < c</pre>
|
||||
</div>
|
||||
<p>In practice, <code><a href="#chp-https://rdrr.io/r/base/factor" data-type="xref">#chp-https://rdrr.io/r/base/factor</a></code> factors behave very similarly to regular factors. There are only two places where you might notice different behavior:</p>
|
||||
<p>In practice, <code><a href="https://rdrr.io/r/base/factor.html">ordered()</a></code> factors behave very similarly to regular factors. There are only two places where you might notice different behavior:</p>
|
||||
<ul><li>If you map an ordered factor to color or fill in ggplot2, it will default to <code>scale_color_viridis()</code>/<code>scale_fill_viridis()</code>, a color scale that implies a ranking.</li>
|
||||
<li>If you use an ordered function in a linear model, it will use “polygonal contrasts”. These are mildly useful, but you are unlikely to have heard of them unless you have a PhD in Statistics, and even then you probably don’t routinely interpret them. If you want to learn more, we recommend <code>vignette("contrasts", package = "faux")</code> by Lisa DeBruine.</li>
|
||||
</ul><p>Given the arguable utility of these differences, we don’t generally recommend using ordered factors.</p>
|
||||
@@ -437,8 +437,8 @@ Ordered factors</h1>
|
||||
<section id="summary" data-type="sect1">
|
||||
<h1>
|
||||
Summary</h1>
|
||||
<p>This chapter introduced you to the handy forcats package for working with factors, introducing you to the most commonly used functions. forcats contains a wide range of other helpers that we didn’t have space to discuss here, so whenever you’re facing a factor analysis challenge that you haven’t encountered before, I highly recommend skimming the <a href="#chp-https://forcats.tidyverse.org/reference/index" data-type="xref">#chp-https://forcats.tidyverse.org/reference/index</a> to see if there’s a canned function that can help solve your problem.</p>
|
||||
<p>If you want to learn more about factors after reading this chapter, we recommend reading Amelia McNamara and Nicholas Horton’s paper, <a href="#chp-https://peerj.com/preprints/3163/" data-type="xref">#chp-https://peerj.com/preprints/3163/</a>. This paper lays out some of the history discussed in <a href="#chp-https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/" data-type="xref">#chp-https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/</a> and <a href="#chp-https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh" data-type="xref">#chp-https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh</a>, and compares the tidy approaches to categorical data outlined in this book with base R methods. An early version of the paper helped motivate and scope the forcats package; thanks Amelia & Nick!</p>
|
||||
<p>This chapter introduced you to the handy forcats package for working with factors, introducing you to the most commonly used functions. forcats contains a wide range of other helpers that we didn’t have space to discuss here, so whenever you’re facing a factor analysis challenge that you haven’t encountered before, I highly recommend skimming the <a href="https://forcats.tidyverse.org/reference/index.html">reference index</a> to see if there’s a canned function that can help solve your problem.</p>
|
||||
<p>If you want to learn more about factors after reading this chapter, we recommend reading Amelia McNamara and Nicholas Horton’s paper, <a href="https://peerj.com/preprints/3163/"><em>Wrangling categorical data in R</em></a>. This paper lays out some of the history discussed in <a href="https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/"><em>stringsAsFactors: An unauthorized biography</em></a> and <a href="https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh"><em>stringsAsFactors = <sigh></em></a>, and compares the tidy approaches to categorical data outlined in this book with base R methods. An early version of the paper helped motivate and scope the forcats package; thanks Amelia & Nick!</p>
|
||||
<p>In the next chapter we’ll switch gears to start learning about dates and times in R. Dates and times seem deceptively simple, but as you’ll soon see, the more you learn about them, the more complex they seem to get!</p>
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user