More minor page count tweaks & fixes

And re-convert with latest htmlbook
This commit is contained in:
Hadley Wickham
2023-01-26 10:36:07 -06:00
parent d9afa135fc
commit aa9d72a7c6
38 changed files with 838 additions and 1093 deletions

View File

@@ -1,23 +1,14 @@
<section data-type="chapter" id="chp-regexps">
<h1><span id="sec-regular-expressions" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Regular expressions</span></span></h1>
<section id="introduction" data-type="sect1">
<section id="regexps-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>In <a href="#chp-strings" data-type="xref">#chp-strings</a>, you learned a whole bunch of useful functions for working with strings. This chapter will focus on functions that use <strong>regular expressions</strong>, a concise and powerful language for describing patterns within strings. The term “regular expression” is a bit of a mouthful, so most people abbreviate it to “regex”<span data-type="footnote">You can pronounce it with either a hard-g (reg-x) or a soft-g (rej-x).</span> or “regexp”.</p>
<p>The chapter starts with the basics of regular expressions and the most useful stringr functions for data analysis. Well then expand your knowledge of patterns and cover seven important new topics (escaping, anchoring, character classes, shorthand classes, quantifiers, precedence, and grouping). Next, well talk about some of the other types of patterns that stringr functions can work with and the various “flags” that allow you to tweak the operation of regular expressions. Well finish with a survey of other places in the tidyverse and base R where you might use regexes.</p>
<section id="prerequisites" data-type="sect2">
<section id="regexps-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<div data-type="important"><div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon"/>
</div>
</div>
<p>This chapter relies on features only found in tidyr 1.3.0, which is still in development. If you want to live on the edge, you can get the dev version with <code>devtools::install_github("tidyverse/tidyr")</code>.</p></div>
<p>In this chapter, well use regular expression functions from stringr and tidyr, both core members of the tidyverse, as well as data from the babynames package.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
@@ -46,11 +37,7 @@ Pattern basics</h1>
#&gt; [11] │ boysen&lt;berry&gt;
#&gt; [19] │ cloud&lt;berry&gt;
#&gt; [21] │ cran&lt;berry&gt;
#&gt; [29] │ elder&lt;berry&gt;
#&gt; [32] │ goji &lt;berry&gt;
#&gt; [33] │ goose&lt;berry&gt;
#&gt; [38] │ huckle&lt;berry&gt;
#&gt; ... and 4 more
#&gt; ... and 8 more
str_view(fruit, "BERRY")</pre>
</div>
@@ -70,8 +57,7 @@ str_view(fruit, "BERRY")</pre>
#&gt; [51] │ nect&lt;arine&gt;
#&gt; [62] │ pine&lt;apple&gt;
#&gt; [64] │ pomegr&lt;anate&gt;
#&gt; [70] │ r&lt;aspbe&gt;rry
#&gt; [73] │ sal&lt;al be&gt;rry</pre>
#&gt; ... and 2 more</pre>
</div>
<p><strong>Quantifiers</strong> control how many times a pattern can match:</p>
<ul><li>
@@ -123,11 +109,7 @@ str_view(words, "[^aeiou][^aeiou][^aeiou][^aeiou]")
#&gt; [34] │ alth&lt;ough&gt;
#&gt; [37] │ am&lt;ount&gt;
#&gt; [46] │ app&lt;oint&gt;
#&gt; [47] │ appr&lt;oach&gt;
#&gt; [52] │ ar&lt;ound&gt;
#&gt; [61] │ &lt;auth&gt;ority
#&gt; [79] │ be&lt;auty&gt;
#&gt; ... and 62 more</pre>
#&gt; ... and 66 more</pre>
</div>
<p>(Well learn more elegant ways to express these ideas in <a href="#sec-quantifiers" data-type="xref">#sec-quantifiers</a>.)</p>
<p>You can use <strong>alternation</strong>, <code>|</code> to pick between one or more alternative patterns. For example, the following patterns look for fruits containing “apple”, “pear”, or “banana”, or a repeated vowel.</p>
@@ -144,11 +126,6 @@ str_view(fruit, "aa|ee|ii|oo|uu")
#&gt; [66] │ purple mangost&lt;ee&gt;n</pre>
</div>
<p>Regular expressions are very compact and use a lot of punctuation characters, so they can seem overwhelming and hard to read at first. Dont worry; youll get better with practice, and simple patterns will soon become second nature. Lets kick off that process by practicing with some useful stringr functions.</p>
<section id="exercises" data-type="sect2">
<h2>
Exercises</h2>
</section>
</section>
<section id="sec-stringr-regex-funs" data-type="sect1">
@@ -286,7 +263,7 @@ str_remove_all(x, "[aeiou]")
<section id="sec-extract-variables" data-type="sect2">
<h2>
Extract variables</h2>
<p>The last function well discuss uses regular expressions to extract data out of one column into one or more new columns: <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code>. Its a peer of the <code>separate_wider_location()</code> and <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_delim()</a></code> functions that you learned about in <a href="#sec-string-columns" data-type="xref">#sec-string-columns</a>. These functions live in tidyr because the operates on (columns of) data frames, rather than individual vectors.</p>
<p>The last function well discuss uses regular expressions to extract data out of one column into one or more new columns: <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code>. Its a peer of the <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_position()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_delim()</a></code> functions that you learned about in <a href="#sec-string-columns" data-type="xref">#sec-string-columns</a>. These functions live in tidyr because they operate on (columns of) data frames, rather than individual vectors.</p>
<p>Lets create a simple dataset to show how it works. Here we have some data derived from <code>babynames</code> where we have the name, gender, and age of a bunch of people in a rather weird format<span data-type="footnote">We wish we could reassure you that youd never see something this weird in real life, but unfortunately over the course of your career youre likely to see much weirder!</span>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">df &lt;- tribble(
@@ -325,7 +302,7 @@ Extract variables</h2>
<p>If the match fails, you can use <code>too_short = "debug"</code> to figure out what went wrong, just like <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_delim()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_position()</a></code>.</p>
</section>
<section id="exercises-1" data-type="sect2">
<section id="regexps-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>What baby name has the most vowels? What name has the highest proportion of vowels? (Hint: what is the denominator?)</p></li>
@@ -398,8 +375,8 @@ str_view(fruit, "a$")
#&gt; [56] │ papay&lt;a&gt;
#&gt; [74] │ satsum&lt;a&gt;</pre>
</div>
<p>Its tempting to think that <code>$</code> should matches the start of a string, because thats how we write dollar amounts, but its not what regular expressions want.</p>
<p>To force a regular expression to only the full string, anchor it with both <code>^</code> and <code>$</code>:</p>
<p>Its tempting to think that <code>$</code> should match the start of a string, because thats how we write dollar amounts, but its not what regular expressions want.</p>
<p>To force a regular expression to match only the full string, anchor it with both <code>^</code> and <code>$</code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">str_view(fruit, "apple")
#&gt; [1] │ &lt;apple&gt;
@@ -407,7 +384,7 @@ str_view(fruit, "a$")
str_view(fruit, "^apple$")
#&gt; [1] │ &lt;apple&gt;</pre>
</div>
<p>You can also match the boundary between words (i.e. the start or end of a word) with <code>\b</code>. This can be particularly when using RStudios find and replace tool. For example, if to find all uses of <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code>, you can search for <code>\bsum\b</code> to avoid matching <code>summarize</code>, <code>summary</code>, <code>rowsum</code> and so on:</p>
<p>You can also match the boundary between words (i.e. the start or end of a word) with <code>\b</code>. This can be particularly useful when using RStudios find and replace tool. For example, if to find all uses of <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code>, you can search for <code>\bsum\b</code> to avoid matching <code>summarize</code>, <code>summary</code>, <code>rowsum</code> and so on:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">x &lt;- c("summary(x)", "summarize(df)", "rowsum(x)", "sum(x)")
str_view(x, "sum")
@@ -523,7 +500,7 @@ Operator precedence and parentheses</h2>
<section id="grouping-and-capturing" data-type="sect2">
<h2>
Grouping and capturing</h2>
<p>As well overriding operator precedence, parentheses have another important effect: they create <strong>capturing groups</strong> that allow you to use sub-components of the match.</p>
<p>As well as overriding operator precedence, parentheses have another important effect: they create <strong>capturing groups</strong> that allow you to use sub-components of the match.</p>
<p>The first way to use a capturing group is to refer back to it within a match with <strong>back reference</strong>: <code>\1</code> refers to the match contained in the first parenthesis, <code>\2</code> in the second parenthesis, and so on. For example, the following pattern finds all fruits that have a repeated pair of letters:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">str_view(fruit, "(..)\\1")
@@ -548,17 +525,13 @@ Grouping and capturing</h2>
<pre data-type="programlisting" data-code-language="r">sentences |&gt;
str_replace("(\\w+) (\\w+) (\\w+)", "\\1 \\3 \\2") |&gt;
str_view()
#&gt; [1] │ The canoe birch slid on the smooth planks.
#&gt; [2] │ Glue sheet the to the dark blue background.
#&gt; [3] │ It's to easy tell the depth of a well.
#&gt; [4] │ These a days chicken leg is a rare dish.
#&gt; [5] │ Rice often is served in round bowls.
#&gt; [6] │ The of juice lemons makes fine punch.
#&gt; [7] │ The was box thrown beside the parked truck.
#&gt; [8] │ The were hogs fed chopped corn and garbage.
#&gt; [9] │ Four of hours steady work faced us.
#&gt; [10] │ A size large in stockings is hard to sell.
#&gt; ... and 710 more</pre>
#&gt; [1] │ The canoe birch slid on the smooth planks.
#&gt; [2] │ Glue sheet the to the dark blue background.
#&gt; [3] │ It's to easy tell the depth of a well.
#&gt; [4] │ These a days chicken leg is a rare dish.
#&gt; [5] │ Rice often is served in round bowls.
#&gt; [6] │ The of juice lemons makes fine punch.
#&gt; ... and 714 more</pre>
</div>
<p>If you want extract the matches for each group you can use <code><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match()</a></code>. But <code><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match()</a></code> returns a matrix, so its not particularly easy to work with<span data-type="footnote">Mostly because we never discuss matrices in this book!</span>:</p>
<div class="cell">
@@ -605,7 +578,7 @@ str_match(x, "gr(?:e|a)y")
</div>
</section>
<section id="exercises-2" data-type="sect2">
<section id="regexps-exercises-1" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>How would you match the literal string <code>"'\</code>? How about <code>"$^$"</code>?</p></li>
@@ -645,7 +618,7 @@ Pattern control</h1>
<section id="sec-flags" data-type="sect2">
<h2>
Regex flags</h2>
<p>There are a number of settings that can use to control the details of the regexp. These settings are often called <strong>flags</strong> in other programming languages. In stringr, you can use these by wrapping the pattern in a call to <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">regex()</a></code>. The most useful flag is probably <code>ignore_case = TRUE</code> because it allows characters to match either their uppercase or lowercase forms:</p>
<p>There are a number of settings that can be used to control the details of the regexp. These settings are often called <strong>flags</strong> in other programming languages. In stringr, you can use these by wrapping the pattern in a call to <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">regex()</a></code>. The most useful flag is probably <code>ignore_case = TRUE</code> because it allows characters to match either their uppercase or lowercase forms:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">bananas &lt;- c("banana", "Banana", "BANANA")
str_view(bananas, "banana")
@@ -737,7 +710,7 @@ str_view("i İ ı I", coll("İ", ignore_case = TRUE, locale = "tr"))
<h1>
Practice</h1>
<p>To put these ideas into practice well solve a few semi-authentic problems next. Well discuss three general techniques:</p>
<ol type="1"><li>checking you work by creating simple positive and negative controls</li>
<ol type="1"><li>checking your work by creating simple positive and negative controls</li>
<li>combining regular expressions with Boolean algebra</li>
<li>creating complex patterns using string manipulation</li>
</ol>
@@ -753,11 +726,7 @@ Check your work</h2>
#&gt; [7] │ &lt;The&gt; box was thrown beside the parked truck.
#&gt; [8] │ &lt;The&gt; hogs were fed chopped corn and garbage.
#&gt; [11] │ &lt;The&gt; boy was there when the sun rose.
#&gt; [13] │ &lt;The&gt; source of the huge river is the clear spring.
#&gt; [18] │ &lt;The&gt; soft cushion broke the man's fall.
#&gt; [19] │ &lt;The&gt; salt breeze came across from the sea.
#&gt; [20] │ &lt;The&gt; girl at the booth sold fifty bonds.
#&gt; ... and 267 more</pre>
#&gt; ... and 271 more</pre>
</div>
<p>Because that pattern also matches sentences starting with words like <code>They</code> or <code>These</code>. We need to make sure that the “e” is the last letter in the word, which we can do by adding adding a word boundary:</p>
<div class="cell">
@@ -768,26 +737,18 @@ Check your work</h2>
#&gt; [8] │ &lt;The&gt; hogs were fed chopped corn and garbage.
#&gt; [11] │ &lt;The&gt; boy was there when the sun rose.
#&gt; [13] │ &lt;The&gt; source of the huge river is the clear spring.
#&gt; [18] │ &lt;The&gt; soft cushion broke the man's fall.
#&gt; [19] │ &lt;The&gt; salt breeze came across from the sea.
#&gt; [20] │ &lt;The&gt; girl at the booth sold fifty bonds.
#&gt; [21] │ &lt;The&gt; small pup gnawed a hole in the sock.
#&gt; ... and 246 more</pre>
#&gt; ... and 250 more</pre>
</div>
<p>What about finding all sentences that begin with a pronoun?</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">str_view(sentences, "^She|He|It|They\\b")
#&gt; [3] │ &lt;It&gt;'s easy to tell the depth of a well.
#&gt; [15] │ &lt;He&gt;lp the woman get back to her feet.
#&gt; [27] │ &lt;He&gt;r purse was full of useless trash.
#&gt; [29] │ &lt;It&gt; snowed, rained, and hailed the same morning.
#&gt; [63] │ &lt;He&gt; ran half way to the hardware store.
#&gt; [90] │ &lt;He&gt; lay prone and hardly moved a limb.
#&gt; [116] │ &lt;He&gt; ordered peach pie with ice cream.
#&gt; [118] │ &lt;He&gt;mp is a weed found in parts of the tropics.
#&gt; [127] │ &lt;It&gt; caught its hind paw in a rusty trap.
#&gt; [132] │ &lt;He&gt; said the same phrase thirty times.
#&gt; ... and 53 more</pre>
#&gt; [3] │ &lt;It&gt;'s easy to tell the depth of a well.
#&gt; [15] │ &lt;He&gt;lp the woman get back to her feet.
#&gt; [27] │ &lt;He&gt;r purse was full of useless trash.
#&gt; [29] │ &lt;It&gt; snowed, rained, and hailed the same morning.
#&gt; [63] │ &lt;He&gt; ran half way to the hardware store.
#&gt; [90] │ &lt;He&gt; lay prone and hardly moved a limb.
#&gt; ... and 57 more</pre>
</div>
<p>A quick inspection of the results shows that were getting some spurious matches. Thats because weve forgotten to use parentheses:</p>
<div class="cell">
@@ -798,11 +759,7 @@ Check your work</h2>
#&gt; [90] │ &lt;He&gt; lay prone and hardly moved a limb.
#&gt; [116] │ &lt;He&gt; ordered peach pie with ice cream.
#&gt; [127] │ &lt;It&gt; caught its hind paw in a rusty trap.
#&gt; [132] │ &lt;He&gt; said the same phrase thirty times.
#&gt; [153] │ &lt;He&gt; broke a new shoelace that day.
#&gt; [159] │ &lt;She&gt; sewed the torn coat quite neatly.
#&gt; [168] │ &lt;He&gt; knew the skill of the great young actress.
#&gt; ... and 47 more</pre>
#&gt; ... and 51 more</pre>
</div>
<p>You might wonder how you might spot such a mistake if it didnt occur in the first few matches. A good technique is to create a few positive and negative matches and use them to test that your pattern works as expected:</p>
<div class="cell">
@@ -850,11 +807,7 @@ Boolean operations</h2>
#&gt; [62] │ &lt;availab&gt;le
#&gt; [66] │ &lt;ba&gt;by
#&gt; [67] │ &lt;ba&gt;ck
#&gt; [68] │ &lt;ba&gt;d
#&gt; [69] │ &lt;ba&gt;g
#&gt; [70] │ &lt;bala&gt;nce
#&gt; [71] │ &lt;ba&gt;ll
#&gt; ... and 20 more</pre>
#&gt; ... and 24 more</pre>
</div>
<p>Its simpler to combine the results of two calls to <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code>:</p>
<div class="cell">
@@ -897,11 +850,7 @@ Creating a pattern with code</h2>
#&gt; [148] │ The spot on the blotter was made by &lt;green&gt; ink.
#&gt; [160] │ The sofa cushion is &lt;red&gt; and of light weight.
#&gt; [174] │ The sky that morning was clear and bright &lt;blue&gt;.
#&gt; [204] │ A &lt;blue&gt; crane is a tall wading bird.
#&gt; [217] │ It is hard to erase &lt;blue&gt; or &lt;red&gt; ink.
#&gt; [224] │ The lamp shone with a steady &lt;green&gt; flame.
#&gt; [247] │ The box is held by a bright &lt;red&gt; snapper.
#&gt; ... and 16 more</pre>
#&gt; ... and 20 more</pre>
</div>
<p>But as the number of colors grows, it would quickly get tedious to construct this pattern by hand. Wouldnt it be nice if we could store the colors in a vector?</p>
<div class="cell">
@@ -915,34 +864,26 @@ Creating a pattern with code</h2>
<p>We could make this pattern more comprehensive if we had a good list of colors. One place we could start from is the list of built-in colors that R can use for plots:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">str_view(colors())
#&gt; [1] │ white
#&gt; [2] │ aliceblue
#&gt; [3] │ antiquewhite
#&gt; [4] │ antiquewhite1
#&gt; [5] │ antiquewhite2
#&gt; [6] │ antiquewhite3
#&gt; [7] │ antiquewhite4
#&gt; [8] │ aquamarine
#&gt; [9] │ aquamarine1
#&gt; [10] │ aquamarine2
#&gt; ... and 647 more</pre>
#&gt; [1] │ white
#&gt; [2] │ aliceblue
#&gt; [3] │ antiquewhite
#&gt; [4] │ antiquewhite1
#&gt; [5] │ antiquewhite2
#&gt; [6] │ antiquewhite3
#&gt; ... and 651 more</pre>
</div>
<p>But lets first eliminate the numbered variants:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">cols &lt;- colors()
cols &lt;- cols[!str_detect(cols, "\\d")]
str_view(cols)
#&gt; [1] │ white
#&gt; [2] │ aliceblue
#&gt; [3] │ antiquewhite
#&gt; [4] │ aquamarine
#&gt; [5] │ azure
#&gt; [6] │ beige
#&gt; [7] │ bisque
#&gt; [8] │ black
#&gt; [9] │ blanchedalmond
#&gt; [10] │ blue
#&gt; ... and 133 more</pre>
#&gt; [1] │ white
#&gt; [2] │ aliceblue
#&gt; [3] │ antiquewhite
#&gt; [4] │ aquamarine
#&gt; [5] │ azure
#&gt; [6] │ beige
#&gt; ... and 137 more</pre>
</div>
<p>Then we can turn this into one giant pattern. We wont show the pattern here because its huge, but you can see it working:</p>
<div class="cell">
@@ -954,16 +895,12 @@ str_view(sentences, pattern)
#&gt; [66] │ Cars and busses stalled in &lt;snow&gt; drifts.
#&gt; [92] │ A wisp of cloud hung in the &lt;blue&gt; air.
#&gt; [112] │ Leaves turn &lt;brown&gt; and &lt;yellow&gt; in the fall.
#&gt; [148] │ The spot on the blotter was made by &lt;green&gt; ink.
#&gt; [149] │ Mud was spattered on the front of his &lt;white&gt; shirt.
#&gt; [160] │ The sofa cushion is &lt;red&gt; and of light weight.
#&gt; [167] │ The office paint was a dull, sad &lt;tan&gt;.
#&gt; ... and 53 more</pre>
#&gt; ... and 57 more</pre>
</div>
<p>In this example, <code>cols</code> only contains numbers and letters so you dont need to worry about metacharacters. But in general, whenever you create create patterns from existing strings its wise to run them through <code><a href="https://stringr.tidyverse.org/reference/str_escape.html">str_escape()</a></code> to ensure they match literally.</p>
<p>In this example, <code>cols</code> only contains numbers and letters so you dont need to worry about metacharacters. But in general, whenever you create patterns from existing strings its wise to run them through <code><a href="https://stringr.tidyverse.org/reference/str_escape.html">str_escape()</a></code> to ensure they match literally.</p>
</section>
<section id="exercises-3" data-type="sect2">
<section id="regexps-exercises-2" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li>
@@ -988,8 +925,8 @@ Regular expressions in other places</h1>
tidyverse</h2>
<p>There are three other particularly useful places where you might want to use a regular expressions</p>
<ul><li><p><code>matches(pattern)</code> will select all variables whose name matches the supplied pattern. Its a “tidyselect” function that you can use anywhere in any tidyverse function that selects variables (e.g. <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename_with()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/across.html">across()</a></code>).</p></li>
<li><p><code>pivot_longer()'s</code> <code>names_pattern</code> argument takes a vector of regular expressions, just like <code>separate_with_regex()</code>. Its useful when extracting data out of variable names with a complex structure</p></li>
<li><p>The <code>delim</code> argument in <code>separate_delim_longer()</code> and <code>separate_delim_wider()</code> usually matches a fixed string, but you can use <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">regex()</a></code> to make it match a pattern. This is useful, for example, if you want to match a comma that is optionally followed by a space, i.e. <code>regex(", ?")</code>.</p></li>
<li><p><code>pivot_longer()'s</code> <code>names_pattern</code> argument takes a vector of regular expressions, just like <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code>. Its useful when extracting data out of variable names with a complex structure</p></li>
<li><p>The <code>delim</code> argument in <code><a href="https://tidyr.tidyverse.org/reference/separate_longer_delim.html">separate_longer_delim()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_delim()</a></code> usually matches a fixed string, but you can use <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">regex()</a></code> to make it match a pattern. This is useful, for example, if you want to match a comma that is optionally followed by a space, i.e. <code>regex(", ?")</code>.</p></li>
</ul></section>
<section id="base-r" data-type="sect2">
@@ -1011,7 +948,7 @@ Base R</h2>
</section>
</section>
<section id="summary" data-type="sect1">
<section id="regexps-summary" data-type="sect1">
<h1>
Summary</h1>
<p>With every punctuation character potentially overloaded with meaning, regular expressions are one of the most compact languages out there. Theyre definitely confusing at first but as you train your eyes to read them and your brain to understand them, you unlock a powerful skill that you can use in R and in many other places.</p>