Fix code language

2022-11-18 11:26:25 -06:00
parent 69b4597f3b
commit 868a35ca71
29 changed files with 912 additions and 907 deletions
--- a/oreilly/regexps.html
+++ b/oreilly/regexps.html
@@ -20,7 +20,7 @@ Prerequisites</h2>

 <p>In this chapter, we’ll use regular expression functions from stringr and tidyr, both core members of the tidyverse, as well as data from the babynames package.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">library(tidyverse)
+<pre data-type="programlisting" data-code-language="r">library(tidyverse)
 library(babynames)</pre>
 </div>
 <p>Through this chapter we’ll use a mix of very simple inline examples so you can get the basic idea, the baby names data, and three character vectors from stringr:</p>
@@ -39,7 +39,7 @@ Pattern basics</h1>
 <p>We’ll use <code><a href="https://stringr.tidyverse.org/reference/str_view.html">str_view()</a></code> to learn how regex patterns work. We used <code><a href="https://stringr.tidyverse.org/reference/str_view.html">str_view()</a></code> in the last chapter to better understand a string vs its printed representation, and now we’ll use it with its second argument, a regular expression. When this is supplied, <code><a href="https://stringr.tidyverse.org/reference/str_view.html">str_view()</a></code> will show only the elements of the string vector that match, surrounding each match with <code>&lt;&gt;</code>, and, where possible, highlighting the match in blue.</p>
 <p>The simplest patterns consist of letters and numbers which match those characters exactly:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "berry")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "berry")
 #&gt;  [6] │ bil&lt;berry&gt;
 #&gt;  [7] │ black&lt;berry&gt;
 #&gt; [10] │ blue&lt;berry&gt;
@@ -56,14 +56,14 @@ str_view(fruit, "BERRY")</pre>
 </div>
 <p>Letters and numbers match exactly and are called <strong>literal characters</strong>. Punctuation characters like <code>.</code>, <code>+</code>, <code>*</code>, <code>[</code>, <code>]</code>, <code>?</code> have special meanings<span data-type="footnote">You’ll learn how to escape these special meanings in <a href="#sec-regexp-escaping" data-type="xref">#sec-regexp-escaping</a>.</span> and are called <strong>meta-characters</strong>. For example, <code>.</code> will match any character<span data-type="footnote">Well, any character apart from <code>\n</code>.</span>, so <code>"a."</code> will match any string that contains an “a” followed by another character :</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(c("a", "ab", "ae", "bd", "ea", "eab"), "a.")
+<pre data-type="programlisting" data-code-language="r">str_view(c("a", "ab", "ae", "bd", "ea", "eab"), "a.")
 #&gt; [2] │ &lt;ab&gt;
 #&gt; [3] │ &lt;ae&gt;
 #&gt; [6] │ e&lt;ab&gt;</pre>
 </div>
 <p>Or we could find all the fruits that contain an “a”, followed by three letters, followed by an “e”:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "a...e")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "a...e")
 #&gt;  [1] │ &lt;apple&gt;
 #&gt;  [7] │ bl&lt;ackbe&gt;rry
 #&gt; [48] │ mand&lt;arine&gt;
@@ -81,7 +81,7 @@ str_view(fruit, "BERRY")</pre>
 <li>
 <code>*</code> lets a pattern be optional or repeat (i.e. it matches any number of times, including 0).</li>
 </ul><div class="cell">
-<pre data-type="programlisting" data-code-language="downlit"># ab? matches an "a", optionally followed by a "b".
+<pre data-type="programlisting" data-code-language="r"># ab? matches an "a", optionally followed by a "b".
 str_view(c("a", "ab", "abb"), "ab?")
 #&gt; [1] │ &lt;a&gt;
 #&gt; [2] │ &lt;ab&gt;
@@ -100,7 +100,7 @@ str_view(c("a", "ab", "abb"), "ab*")
 </div>
 <p><strong>Character classes</strong> are defined by <code>[]</code> and let you match a set set of characters, e.g. <code>[abcd]</code> matches “a”, “b”, “c”, or “d”. You can also invert the match by starting with <code>^</code>: <code>[^abcd]</code> matches anything <strong>except</strong> “a”, “b”, “c”, or “d”. We can use this idea to find the words with three vowels or four consonants in a row:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words, "[aeiou][aeiou][aeiou]")
+<pre data-type="programlisting" data-code-language="r">str_view(words, "[aeiou][aeiou][aeiou]")
 #&gt;  [79] │ b&lt;eau&gt;ty
 #&gt; [565] │ obv&lt;iou&gt;s
 #&gt; [644] │ prev&lt;iou&gt;s
@@ -116,7 +116,7 @@ str_view(words, "[^aeiou][^aeiou][^aeiou][^aeiou]")
 </div>
 <p>You can combine character classes and quantifiers. For example, the following regexp looks for two vowel followed by two or more consonants:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words, "[aeiou][aeiou][^aeiou][^aeiou]+")
+<pre data-type="programlisting" data-code-language="r">str_view(words, "[aeiou][aeiou][^aeiou][^aeiou]+")
 #&gt;  [6] │ acc&lt;ount&gt;
 #&gt; [21] │ ag&lt;ainst&gt;
 #&gt; [31] │ alr&lt;eady&gt;
@@ -132,7 +132,7 @@ str_view(words, "[^aeiou][^aeiou][^aeiou][^aeiou]")
 <p>(We’ll learn some more elegant ways to express these ideas in <a href="#sec-quantifiers" data-type="xref">#sec-quantifiers</a>.)</p>
 <p>You can use <strong>alternation</strong>, <code>|</code> to pick between one or more alternative patterns. For example, the following patterns look for fruits containing “apple”, “pear”, or “banana”, or a repeated vowel.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "apple|pear|banana")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "apple|pear|banana")
 #&gt;  [1] │ &lt;apple&gt;
 #&gt;  [4] │ &lt;banana&gt;
 #&gt; [59] │ &lt;pear&gt;
@@ -161,12 +161,12 @@ Key functions</h1>
 Detect matches</h2>
 <p><code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code> returns a logical vector that is <code>TRUE</code> if the pattern matched an element of the character vector and <code>FALSE</code> otherwise:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_detect(c("a", "b", "c"), "[aeiou]")
+<pre data-type="programlisting" data-code-language="r">str_detect(c("a", "b", "c"), "[aeiou]")
 #&gt; [1]  TRUE FALSE FALSE</pre>
 </div>
 <p>Since <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code> returns a logical vector of the same length as the initial vector, it pairs well with <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code>. For example, this code finds all the most popular names containing a lower-case “x”:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">babynames |&gt; 
+<pre data-type="programlisting" data-code-language="r">babynames |&gt; 
  filter(str_detect(name, "x")) |&gt; 
  count(name, wt = n, sort = TRUE)
 #&gt; # A tibble: 974 × 2
@@ -182,7 +182,7 @@ Detect matches</h2>
 </div>
 <p>We can also use <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code> with <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> by pairing it with <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code> or <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code>: <code>sum(str_detect(x, pattern))</code> tells you the number of observations that match and <code>mean(str_detect(x, pattern))</code> tells you the proportion that match. For example, the following snippet computes and visualizes the proportion of baby names<span data-type="footnote">This gives us the proportion of <strong>names</strong> that contain an “x”; if you wanted the proportion of babies with a name containing an x, you’d need to perform a weighted mean.</span> that contain “x”, broken down by year. It looks like they’ve radically increased in popularity lately!</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">babynames |&gt; 
+<pre data-type="programlisting" data-code-language="r">babynames |&gt; 
  group_by(year) |&gt; 
  summarise(prop_x = mean(str_detect(name, "x"))) |&gt; 
  ggplot(aes(year, prop_x)) + 
@@ -196,7 +196,7 @@ Detect matches</h2>
 </div>
 <p>There are two functions that are closely related to <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code>, namely <code><a href="https://stringr.tidyverse.org/reference/str_subset.html">str_subset()</a></code> which returns just the strings that contain a match and <code><a href="https://stringr.tidyverse.org/reference/str_which.html">str_which()</a></code> which returns the indexes of strings that have a match:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_subset(c("a", "b", "c"), "[aeiou]")
+<pre data-type="programlisting" data-code-language="r">str_subset(c("a", "b", "c"), "[aeiou]")
 #&gt; [1] "a"
 str_which(c("a", "b", "c"), "[aeiou]")
 #&gt; [1] 1</pre>
@@ -208,20 +208,20 @@ str_which(c("a", "b", "c"), "[aeiou]")
 Count matches</h2>
 <p>The next step up in complexity from <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code> is <code><a href="https://stringr.tidyverse.org/reference/str_count.html">str_count()</a></code>: rather than a simple true or false, it tells you how many matches there are in each string.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- c("apple", "banana", "pear")
+<pre data-type="programlisting" data-code-language="r">x &lt;- c("apple", "banana", "pear")
 str_count(x, "p")
 #&gt; [1] 2 0 1</pre>
 </div>
 <p>Note that each match starts at the end of the previous match; i.e. regex matches never overlap. For example, in <code>"abababa"</code>, how many times will the pattern <code>"aba"</code> match? Regular expressions say two, not three:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_count("abababa", "aba")
+<pre data-type="programlisting" data-code-language="r">str_count("abababa", "aba")
 #&gt; [1] 2
 str_view("abababa", "aba")
 #&gt; [1] │ &lt;aba&gt;b&lt;aba&gt;</pre>
 </div>
 <p>It’s natural to use <code><a href="https://stringr.tidyverse.org/reference/str_count.html">str_count()</a></code> with <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>. The following example uses <code><a href="https://stringr.tidyverse.org/reference/str_count.html">str_count()</a></code> with character classes to count the number of vowels and consonants in each name.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">babynames |&gt; 
+<pre data-type="programlisting" data-code-language="r">babynames |&gt; 
  count(name) |&gt; 
  mutate(
    vowels = str_count(name, "[aeiou]"),
@@ -245,7 +245,7 @@ str_view("abababa", "aba")
 </ul><p>This variety of approaches is pretty typical when working with strings — there are often multiple ways to reach your goal, either by making your pattern more complicated or by doing some preprocessing on your string. If you get stuck trying one approach, it can often be useful to switch gears and tackle the problem from a different perspective.</p>
 <p>In this case, since we’re applying two functions to the name, I think it’s easier to transform it first:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">babynames |&gt; 
+<pre data-type="programlisting" data-code-language="r">babynames |&gt; 
  count(name) |&gt; 
  mutate(
    name = str_to_lower(name),
@@ -270,13 +270,13 @@ str_view("abababa", "aba")
 Replace values</h2>
 <p>As well as detecting and counting matches, we can also modify them with <code><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace()</a></code> and <code><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace_all()</a></code>. <code><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace()</a></code> replaces the first match, and as the name suggests, <code><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace_all()</a></code> replaces all matches.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- c("apple", "pear", "banana")
+<pre data-type="programlisting" data-code-language="r">x &lt;- c("apple", "pear", "banana")
 str_replace_all(x, "[aeiou]", "-")
 #&gt; [1] "-ppl-"  "p--r"   "b-n-n-"</pre>
 </div>
 <p><code><a href="https://stringr.tidyverse.org/reference/str_remove.html">str_remove()</a></code> and <code><a href="https://stringr.tidyverse.org/reference/str_remove.html">str_remove_all()</a></code> are handy shortcuts for <code>str_replace(x, pattern, "")</code>.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- c("apple", "pear", "banana")
+<pre data-type="programlisting" data-code-language="r">x &lt;- c("apple", "pear", "banana")
 str_remove_all(x, "[aeiou]")
 #&gt; [1] "ppl" "pr"  "bnn"</pre>
 </div>
@@ -289,7 +289,7 @@ Extract variables</h2>
 <p>The last function we’ll discuss uses regular expressions to extract data out of one column into one or more new columns: <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code>. It’s a peer of the <code>separate_wider_location()</code> and <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_delim()</a></code> functions that you learned about in <a href="#sec-string-columns" data-type="xref">#sec-string-columns</a>. These functions live in tidyr because the operates on (columns of) data frames, rather than individual vectors.</p>
 <p>Let’s create a simple dataset to show how it works. Here we have some data derived from <code>babynames</code> where we have the name, gender, and age of a bunch of people in a rather weird format<span data-type="footnote">We wish we could reassure you that you’d never see something this weird in real life, but unfortunately over the course of your career you’re likely to see much weirder!</span>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">df &lt;- tribble(
+<pre data-type="programlisting" data-code-language="r">df &lt;- tribble(
  ~str,
  "&lt;Sheryl&gt;-F_34",
  "&lt;Kisha&gt;-F_45", 
@@ -302,7 +302,7 @@ Extract variables</h2>
 </div>
 <p>To extract this data using <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code> we just need to construct a sequence of regular expressions that match each piece. If we want the contents of that piece to appear in the output, we give it a name:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">df |&gt; 
+<pre data-type="programlisting" data-code-language="r">df |&gt; 
  separate_wider_regex(
    str,
    patterns = c(
@@ -346,7 +346,7 @@ Pattern details</h1>
 Escaping</h2>
 <p>In order to match a literal <code>.</code>, you need an <strong>escape</strong> which tells the regular expression to match metacharacters literally. Like strings, regexps use the backslash for escaping. So, to match a <code>.</code>, you need the regexp <code>\.</code>. Unfortunately this creates a problem. We use strings to represent regular expressions, and <code>\</code> is also used as an escape symbol in strings. So to create the regular expression <code>\.</code> we need the string <code>"\\."</code>, as the following example shows.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit"># To create the regular expression \., we need to use \\.
+<pre data-type="programlisting" data-code-language="r"># To create the regular expression \., we need to use \\.
 dot &lt;- "\\."

 # But the expression itself only contains one \
@@ -360,7 +360,7 @@ str_view(c("abc", "a.c", "bef"), "a\\.c")
 <p>In this book, we’ll usually write regular expression without quotes, like <code>\.</code>. If we need to emphasize what you’ll actually type, we’ll surround it with quotes and add extra escapes, like <code>"\\."</code>.</p>
 <p>If <code>\</code> is used as an escape character in regular expressions, how do you match a literal <code>\</code>? Well, you need to escape it, creating the regular expression <code>\\</code>. To create that regular expression, you need to use a string, which also needs to escape <code>\</code>. That means to match a literal <code>\</code> you need to write <code>"\\\\"</code> — you need four backslashes to match one!</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "a\\b"
+<pre data-type="programlisting" data-code-language="r">x &lt;- "a\\b"
 str_view(x)
 #&gt; [1] │ a\b
 str_view(x, "\\\\")
@@ -368,12 +368,12 @@ str_view(x, "\\\\")
 </div>
 <p>Alternatively, you might find it easier to use the raw strings you learned about in <a href="#sec-raw-strings" data-type="xref">#sec-raw-strings</a>). That lets you to avoid one layer of escaping:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(x, r"{\\}")
+<pre data-type="programlisting" data-code-language="r">str_view(x, r"{\\}")
 #&gt; [1] │ a&lt;\&gt;b</pre>
 </div>
 <p>If you’re trying to match a literal <code>.</code>, <code>$</code>, <code>|</code>, <code>*</code>, <code>+</code>, <code>?</code>, <code>{</code>, <code>}</code>, <code>(</code>, <code>)</code>, there’s an alternative to using a backslash escape: you can use a character class: <code>[.]</code>, <code>[$]</code>, <code>[|]</code>, ... all match the literal values.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")
+<pre data-type="programlisting" data-code-language="r">str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c")
 #&gt; [2] │ &lt;a.c&gt;
 str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")
 #&gt; [3] │ &lt;a*c&gt;</pre>
@@ -386,7 +386,7 @@ str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c")
 Anchors</h2>
 <p>By default, regular expressions will match any part of a string. If you want to match at the start of end you need to <strong>anchor</strong> the regular expression using <code>^</code> to match the start of the string or <code>$</code> to match the end of the string:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "^a")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "^a")
 #&gt; [1] │ &lt;a&gt;pple
 #&gt; [2] │ &lt;a&gt;pricot
 #&gt; [3] │ &lt;a&gt;vocado
@@ -401,7 +401,7 @@ str_view(fruit, "a$")
 <p>It’s tempting to think that <code>$</code> should matches the start of a string, because that’s how we write dollar amounts, but it’s not what regular expressions want.</p>
 <p>To force a regular expression to only the full string, anchor it with both <code>^</code> and <code>$</code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "apple")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "apple")
 #&gt;  [1] │ &lt;apple&gt;
 #&gt; [62] │ pine&lt;apple&gt;
 str_view(fruit, "^apple$")
@@ -409,7 +409,7 @@ str_view(fruit, "^apple$")
 </div>
 <p>You can also match the boundary between words (i.e. the start or end of a word) with <code>\b</code>. This can be particularly when using RStudio’s find and replace tool. For example, if to find all uses of <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code>, you can search for <code>\bsum\b</code> to avoid matching <code>summarise</code>, <code>summary</code>, <code>rowsum</code> and so on:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- c("summary(x)", "summarise(df)", "rowsum(x)", "sum(x)")
+<pre data-type="programlisting" data-code-language="r">x &lt;- c("summary(x)", "summarise(df)", "rowsum(x)", "sum(x)")
 str_view(x, "sum")
 #&gt; [1] │ &lt;sum&gt;mary(x)
 #&gt; [2] │ &lt;sum&gt;marise(df)
@@ -420,14 +420,14 @@ str_view(x, "\\bsum\\b")
 </div>
 <p>When used alone, anchors will produce a zero-width match:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view("abc", c("$", "^", "\\b"))
+<pre data-type="programlisting" data-code-language="r">str_view("abc", c("$", "^", "\\b"))
 #&gt; [1] │ abc&lt;&gt;
 #&gt; [2] │ &lt;&gt;abc
 #&gt; [3] │ &lt;&gt;abc&lt;&gt;</pre>
 </div>
 <p>This helps you understand what happens when you replace a standalone anchor:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_replace_all("abc", c("$", "^", "\\b"), "--")
+<pre data-type="programlisting" data-code-language="r">str_replace_all("abc", c("$", "^", "\\b"), "--")
 #&gt; [1] "abc--"   "--abc"   "--abc--"</pre>
 </div>
 </section>
@@ -444,7 +444,7 @@ Character classes</h2>
 <code>\</code> escapes special characters, so <code>[\^\-\]]</code> matches <code>^</code>, <code>-</code>, or <code>]</code>.</li>
 </ul><p>Here are few examples:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "abcd ABCD 12345 -!@#%."
+<pre data-type="programlisting" data-code-language="r">x &lt;- "abcd ABCD 12345 -!@#%."
 str_view(x, "[abc]+")
 #&gt; [1] │ &lt;abc&gt;d ABCD 12345 -!@#%.
 str_view(x, "[a-z]+")
@@ -468,7 +468,7 @@ str_view("a-b-c", "[a\\-c]")
 <code>\w</code> matches any “word” character, i.e. letters and numbers;<br/><code>\W</code> matches any “non-word” character.</li>
 </ul><p>The following code demonstrates the six shortcuts with a selection of letters, numbers, and punctuation characters.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "abcd ABCD 12345 -!@#%."
+<pre data-type="programlisting" data-code-language="r">x &lt;- "abcd ABCD 12345 -!@#%."
 str_view(x, "\\d+")
 #&gt; [1] │ abcd ABCD &lt;12345&gt; -!@#%.
 str_view(x, "\\D+")
@@ -496,7 +496,7 @@ Quantifiers</h2>
 <code>{n,m}</code> matches between n and m times.</li>
 </ul><p>The following code shows how this works for a few simple examples:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "-- -x- -xx- -xxx- -xxxx- -xxxxx-"
+<pre data-type="programlisting" data-code-language="r">x &lt;- "-- -x- -xx- -xxx- -xxxx- -xxxxx-"
 str_view(x, "-x?-")      # [0, 1]
 #&gt; [1] │ &lt;--&gt; &lt;-x-&gt; -xx- -xxx- -xxxx- -xxxxx-
 str_view(x, "-x+-")      # [1, Inf)
@@ -526,7 +526,7 @@ Grouping and capturing</h2>
 <p>As well overriding operator precedence, parentheses have another important effect: they create <strong>capturing groups</strong> that allow you to use sub-components of the match.</p>
 <p>The first way to use a capturing group is to refer back to it within a match with <strong>back reference</strong>: <code>\1</code> refers to the match contained in the first parenthesis, <code>\2</code> in the second parenthesis, and so on. For example, the following pattern finds all fruits that have a repeated pair of letters:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(fruit, "(..)\\1")
+<pre data-type="programlisting" data-code-language="r">str_view(fruit, "(..)\\1")
 #&gt;  [4] │ b&lt;anan&gt;a
 #&gt; [20] │ &lt;coco&gt;nut
 #&gt; [22] │ &lt;cucu&gt;mber
@@ -536,7 +536,7 @@ Grouping and capturing</h2>
 </div>
 <p>And this one finds all words that start and end with the same pair of letters:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words, "^(..).*\\1$")
+<pre data-type="programlisting" data-code-language="r">str_view(words, "^(..).*\\1$")
 #&gt; [152] │ &lt;church&gt;
 #&gt; [217] │ &lt;decide&gt;
 #&gt; [617] │ &lt;photograph&gt;
@@ -545,7 +545,7 @@ Grouping and capturing</h2>
 </div>
 <p>You can also use back references in <code><a href="https://stringr.tidyverse.org/reference/str_replace.html">str_replace()</a></code>. For example, this code switches the order of the second and third words in <code>sentences</code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">sentences |&gt; 
+<pre data-type="programlisting" data-code-language="r">sentences |&gt; 
  str_replace("(\\w+) (\\w+) (\\w+)", "\\1 \\3 \\2") |&gt; 
  str_view()
 #&gt;  [1] │ The canoe birch slid on the smooth planks.
@@ -562,7 +562,7 @@ Grouping and capturing</h2>
 </div>
 <p>If you want extract the matches for each group you can use <code><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match()</a></code>. But <code><a href="https://stringr.tidyverse.org/reference/str_match.html">str_match()</a></code> returns a matrix, so it’s not particularly easy to work with<span data-type="footnote">Mostly because we never discuss matrices in this book!</span>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">sentences |&gt; 
+<pre data-type="programlisting" data-code-language="r">sentences |&gt; 
  str_match("the (\\w+) (\\w+)") |&gt; 
  head()
 #&gt;      [,1]                [,2]     [,3]    
@@ -575,7 +575,7 @@ Grouping and capturing</h2>
 </div>
 <p>You could convert to a tibble and name the columns:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">sentences |&gt; 
+<pre data-type="programlisting" data-code-language="r">sentences |&gt; 
  str_match("the (\\w+) (\\w+)") |&gt; 
  as_tibble(.name_repair = "minimal") |&gt; 
  set_names("match", "word1", "word2")
@@ -593,7 +593,7 @@ Grouping and capturing</h2>
 <p>But then you’ve basically recreated your own version of <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code>. Indeed, behind the scenes, <code><a href="https://tidyr.tidyverse.org/reference/separate_wider_delim.html">separate_wider_regex()</a></code> converts your vector of patterns to a single regex that uses grouping to capture the named components.</p>
 <p>Occasionally, you’ll want to use parentheses without creating matching groups. You can create a non-capturing group with <code>(?:)</code>.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- c("a gray cat", "a grey dog")
+<pre data-type="programlisting" data-code-language="r">x &lt;- c("a gray cat", "a grey dog")
 str_match(x, "gr(e|a)y")
 #&gt;      [,1]   [,2]
 #&gt; [1,] "gray" "a" 
@@ -647,7 +647,7 @@ Pattern control</h1>
 Regex flags</h2>
 <p>There are a number of settings that can use to control the details of the regexp. These settings are often called <strong>flags</strong> in other programming languages. In stringr, you can use these by wrapping the pattern in a call to <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">regex()</a></code>. The most useful flag is probably <code>ignore_case = TRUE</code> because it allows characters to match either their uppercase or lowercase forms:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">bananas &lt;- c("banana", "Banana", "BANANA")
+<pre data-type="programlisting" data-code-language="r">bananas &lt;- c("banana", "Banana", "BANANA")
 str_view(bananas, "banana")
 #&gt; [1] │ &lt;banana&gt;
 str_view(bananas, regex("banana", ignore_case = TRUE))
@@ -659,7 +659,7 @@ str_view(bananas, regex("banana", ignore_case = TRUE))
 <ul><li>
 <p><code>dotall = TRUE</code> lets <code>.</code> match everything, including <code>\n</code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "Line 1\nLine 2\nLine 3"
+<pre data-type="programlisting" data-code-language="r">x &lt;- "Line 1\nLine 2\nLine 3"
 str_view(x, ".Line")
 str_view(x, regex(".Line", dotall = TRUE))
 #&gt; [1] │ Line 1&lt;
@@ -670,7 +670,7 @@ str_view(x, regex(".Line", dotall = TRUE))
 <li>
 <p><code>multiline = TRUE</code> makes <code>^</code> and <code>$</code> match the start and end of each line rather than the start and end of the complete string:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">x &lt;- "Line 1\nLine 2\nLine 3"
+<pre data-type="programlisting" data-code-language="r">x &lt;- "Line 1\nLine 2\nLine 3"
 str_view(x, "^Line")
 #&gt; [1] │ &lt;Line&gt; 1
 #&gt;     │ Line 2
@@ -683,7 +683,7 @@ str_view(x, regex("^Line", multiline = TRUE))
 </li>
 </ul><p>Finally, if you’re writing a complicated regular expression and you’re worried you might not understand it in the future, you might try <code>comments = TRUE</code>. It tweaks the pattern language to ignore spaces and new lines, as well as everything after <code>#</code>. This allows you to use comments and whitespace to make complex regular expressions more understandable<span data-type="footnote"><code>comments = TRUE</code> is particularly effective in combination with a raw string, as we use here.</span>, as in the following example:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">phone &lt;- regex(
+<pre data-type="programlisting" data-code-language="r">phone &lt;- regex(
  r"(
    \(?     # optional opening parens
    (\d{3}) # area code
@@ -701,7 +701,7 @@ str_match("514-791-8141", phone)
 </div>
 <p>If you’re using comments and want to match a space, newline, or <code>#</code>, you’ll need to escape it:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view("x x #", regex(r"(x #)", comments = TRUE))
+<pre data-type="programlisting" data-code-language="r">str_view("x x #", regex(r"(x #)", comments = TRUE))
 #&gt; [1] │ &lt;x&gt; &lt;x&gt; #
 str_view("x x #", regex(r"(x\ \#)", comments = TRUE))
 #&gt; [1] │ x &lt;x #&gt;</pre>
@@ -713,19 +713,19 @@ str_view("x x #", regex(r"(x\ \#)", comments = TRUE))
 Fixed matches</h2>
 <p>You can opt-out of the regular expression rules by using <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">fixed()</a></code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(c("", "a", "."), fixed("."))
+<pre data-type="programlisting" data-code-language="r">str_view(c("", "a", "."), fixed("."))
 #&gt; [3] │ &lt;.&gt;</pre>
 </div>
 <p><code><a href="https://stringr.tidyverse.org/reference/modifiers.html">fixed()</a></code> also gives you the ability to ignore case:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view("x X", "X")
+<pre data-type="programlisting" data-code-language="r">str_view("x X", "X")
 #&gt; [1] │ x &lt;X&gt;
 str_view("x X", fixed("X", ignore_case = TRUE))
 #&gt; [1] │ &lt;x&gt; &lt;X&gt;</pre>
 </div>
 <p>If you’re working with non-English text, you will probably want <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">coll()</a></code> instead of <code><a href="https://stringr.tidyverse.org/reference/modifiers.html">fixed()</a></code>, as it implements the full rules for capitalization as used by the <code>locale</code> you specify. See <a href="#sec-other-languages" data-type="xref">#sec-other-languages</a> for more details on locales.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view("i İ ı I", fixed("İ", ignore_case = TRUE))
+<pre data-type="programlisting" data-code-language="r">str_view("i İ ı I", fixed("İ", ignore_case = TRUE))
 #&gt; [1] │ i &lt;İ&gt; ı I
 str_view("i İ ı I", coll("İ", ignore_case = TRUE, locale = "tr"))
 #&gt; [1] │ &lt;i&gt; &lt;İ&gt; ı I</pre>
@@ -746,7 +746,7 @@ Practice</h1>
 Check your work</h2>
 <p>First, let’s find all sentences that start with “The”. Using the <code>^</code> anchor alone is not enough:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(sentences, "^The")
+<pre data-type="programlisting" data-code-language="r">str_view(sentences, "^The")
 #&gt;  [1] │ &lt;The&gt; birch canoe slid on the smooth planks.
 #&gt;  [4] │ &lt;The&gt;se days a chicken leg is a rare dish.
 #&gt;  [6] │ &lt;The&gt; juice of lemons makes fine punch.
@@ -761,7 +761,7 @@ Check your work</h2>
 </div>
 <p>Because that pattern also matches sentences starting with words like <code>They</code> or <code>These</code>. We need to make sure that the “e” is the last letter in the word, which we can do by adding adding a word boundary:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(sentences, "^The\\b")
+<pre data-type="programlisting" data-code-language="r">str_view(sentences, "^The\\b")
 #&gt;  [1] │ &lt;The&gt; birch canoe slid on the smooth planks.
 #&gt;  [6] │ &lt;The&gt; juice of lemons makes fine punch.
 #&gt;  [7] │ &lt;The&gt; box was thrown beside the parked truck.
@@ -776,7 +776,7 @@ Check your work</h2>
 </div>
 <p>What about finding all sentences that begin with a pronoun?</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(sentences, "^She|He|It|They\\b")
+<pre data-type="programlisting" data-code-language="r">str_view(sentences, "^She|He|It|They\\b")
 #&gt;   [3] │ &lt;It&gt;'s easy to tell the depth of a well.
 #&gt;  [15] │ &lt;He&gt;lp the woman get back to her feet.
 #&gt;  [27] │ &lt;He&gt;r purse was full of useless trash.
@@ -791,7 +791,7 @@ Check your work</h2>
 </div>
 <p>A quick inspection of the results shows that we’re getting some spurious matches. That’s because we’ve forgotten to use parentheses:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(sentences, "^(She|He|It|They)\\b")
+<pre data-type="programlisting" data-code-language="r">str_view(sentences, "^(She|He|It|They)\\b")
 #&gt;   [3] │ &lt;It&gt;'s easy to tell the depth of a well.
 #&gt;  [29] │ &lt;It&gt; snowed, rained, and hailed the same morning.
 #&gt;  [63] │ &lt;He&gt; ran half way to the hardware store.
@@ -806,7 +806,7 @@ Check your work</h2>
 </div>
 <p>You might wonder how you might spot such a mistake if it didn’t occur in the first few matches. A good technique is to create a few positive and negative matches and use them to test that your pattern works as expected:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">pos &lt;- c("He is a boy", "She had a good time")
+<pre data-type="programlisting" data-code-language="r">pos &lt;- c("He is a boy", "She had a good time")
 neg &lt;- c("Shells come from the sea", "Hadley said 'It's a great day'")

 pattern &lt;- "^(She|He|It|They)\\b"
@@ -823,7 +823,7 @@ str_detect(neg, pattern)
 Boolean operations</h2>
 <p>Imagine we want to find words that only contain consonants. One technique is to create a character class that contains all letters except for the vowels (<code>[^aeiou]</code>), then allow that to match any number of letters (<code>[^aeiou]+</code>), then force it to match the whole string by anchoring to the beginning and the end (<code>^[^aeiou]+$</code>):</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words, "^[^aeiou]+$")
+<pre data-type="programlisting" data-code-language="r">str_view(words, "^[^aeiou]+$")
 #&gt; [123] │ &lt;by&gt;
 #&gt; [249] │ &lt;dry&gt;
 #&gt; [328] │ &lt;fly&gt;
@@ -833,7 +833,7 @@ Boolean operations</h2>
 </div>
 <p>But you can make this problem a bit easier by flipping the problem around. Instead of looking for words that contain only consonants, we could look for words that don’t contain any vowels:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words[!str_detect(words, "[aeiou]")])
+<pre data-type="programlisting" data-code-language="r">str_view(words[!str_detect(words, "[aeiou]")])
 #&gt; [1] │ by
 #&gt; [2] │ dry
 #&gt; [3] │ fly
@@ -843,7 +843,7 @@ Boolean operations</h2>
 </div>
 <p>This is a useful technique whenever you’re dealing with logical combinations, particularly those involving “and” or “not”. For example, imagine if you want to find all words that contain “a” and “b”. There’s no “and” operator built in to regular expressions so we have to tackle it by looking for all words that contain an “a” followed by a “b”, or a “b” followed by an “a”:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(words, "a.*b|b.*a")
+<pre data-type="programlisting" data-code-language="r">str_view(words, "a.*b|b.*a")
 #&gt;  [2] │ &lt;ab&gt;le
 #&gt;  [3] │ &lt;ab&gt;out
 #&gt;  [4] │ &lt;ab&gt;solute
@@ -858,7 +858,7 @@ Boolean operations</h2>
 </div>
 <p>It’s simpler to combine the results of two calls to <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">words[str_detect(words, "a") &amp; str_detect(words, "b")]
+<pre data-type="programlisting" data-code-language="r">words[str_detect(words, "a") &amp; str_detect(words, "b")]
 #&gt;  [1] "able"      "about"     "absolute"  "available" "baby"      "back"     
 #&gt;  [7] "bad"       "bag"       "balance"   "ball"      "bank"      "bar"      
 #&gt; [13] "base"      "basis"     "bear"      "beat"      "beauty"    "because"  
@@ -867,13 +867,13 @@ Boolean operations</h2>
 </div>
 <p>What if we wanted to see if there was a word that contains all vowels? If we did it with patterns we’d need to generate 5! (120) different patterns:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">words[str_detect(words, "a.*e.*i.*o.*u")]
+<pre data-type="programlisting" data-code-language="r">words[str_detect(words, "a.*e.*i.*o.*u")]
 # ...
 words[str_detect(words, "u.*o.*i.*e.*a")]</pre>
 </div>
 <p>It’s much simpler to combine five calls to <code><a href="https://stringr.tidyverse.org/reference/str_detect.html">str_detect()</a></code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">words[
+<pre data-type="programlisting" data-code-language="r">words[
  str_detect(words, "a") &amp;
  str_detect(words, "e") &amp;
  str_detect(words, "i") &amp;
@@ -890,7 +890,7 @@ words[str_detect(words, "u.*o.*i.*e.*a")]</pre>
 Creating a pattern with code</h2>
 <p>What if we wanted to find all <code>sentences</code> that mention a color? The basic idea is simple: we just combine alternation with word boundaries.</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(sentences, "\\b(red|green|blue)\\b")
+<pre data-type="programlisting" data-code-language="r">str_view(sentences, "\\b(red|green|blue)\\b")
 #&gt;   [2] │ Glue the sheet to the dark &lt;blue&gt; background.
 #&gt;  [26] │ Two &lt;blue&gt; fish swam in the tank.
 #&gt;  [92] │ A wisp of cloud hung in the &lt;blue&gt; air.
@@ -905,16 +905,16 @@ Creating a pattern with code</h2>
 </div>
 <p>But as the number of colors grows, it would quickly get tedious to construct this pattern by hand. Wouldn’t it be nice if we could store the colors in a vector?</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">rgb &lt;- c("red", "green", "blue")</pre>
+<pre data-type="programlisting" data-code-language="r">rgb &lt;- c("red", "green", "blue")</pre>
 </div>
 <p>Well, we can! We’d just need to create the pattern from the vector using <code><a href="https://stringr.tidyverse.org/reference/str_c.html">str_c()</a></code> and <code><a href="https://stringr.tidyverse.org/reference/str_flatten.html">str_flatten()</a></code>:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_c("\\b(", str_flatten(rgb, "|"), ")\\b")
+<pre data-type="programlisting" data-code-language="r">str_c("\\b(", str_flatten(rgb, "|"), ")\\b")
 #&gt; [1] "\\b(red|green|blue)\\b"</pre>
 </div>
 <p>We could make this pattern more comprehensive if we had a good list of colors. One place we could start from is the list of built-in colors that R can use for plots:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">str_view(colors())
+<pre data-type="programlisting" data-code-language="r">str_view(colors())
 #&gt;  [1] │ white
 #&gt;  [2] │ aliceblue
 #&gt;  [3] │ antiquewhite
@@ -929,7 +929,7 @@ Creating a pattern with code</h2>
 </div>
 <p>But lets first eliminate the numbered variants:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">cols &lt;- colors()
+<pre data-type="programlisting" data-code-language="r">cols &lt;- colors()
 cols &lt;- cols[!str_detect(cols, "\\d")]
 str_view(cols)
 #&gt;  [1] │ white
@@ -946,7 +946,7 @@ str_view(cols)
 </div>
 <p>Then we can turn this into one giant pattern. We won’t show the pattern here because it’s huge, but you can see it working:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">pattern &lt;- str_c("\\b(", str_flatten(cols, "|"), ")\\b")
+<pre data-type="programlisting" data-code-language="r">pattern &lt;- str_c("\\b(", str_flatten(cols, "|"), ")\\b")
 str_view(sentences, pattern)
 #&gt;   [2] │ Glue the sheet to the dark &lt;blue&gt; background.
 #&gt;  [12] │ A rod is used to catch &lt;pink&gt; &lt;salmon&gt;.
@@ -997,14 +997,14 @@ tidyverse</h2>
 Base R</h2>
 <p><code>apropos(pattern)</code> searches all objects available from the global environment that match the given pattern. This is useful if you can’t quite remember the name of a function:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">apropos("replace")
+<pre data-type="programlisting" data-code-language="r">apropos("replace")
 #&gt; [1] "%+replace%"       "replace"          "replace_na"      
 #&gt; [4] "setReplaceMethod" "str_replace"      "str_replace_all" 
 #&gt; [7] "str_replace_na"   "theme_replace"</pre>
 </div>
 <p><code>list.files(path, pattern)</code> lists all files in <code>path</code> that match a regular expression <code>pattern</code>. For example, you can find all the R Markdown files in the current directory with:</p>
 <div class="cell">
-<pre data-type="programlisting" data-code-language="downlit">head(list.files(pattern = "\\.Rmd$"))
+<pre data-type="programlisting" data-code-language="r">head(list.files(pattern = "\\.Rmd$"))
 #&gt; character(0)</pre>
 </div>
 <p>It’s worth noting that the pattern language used by base R is very slightly different to that used by stringr. That’s because stringr is built on top of the <a href="https://stringi.gagolewski.com">stringi package</a>, which is in turn built on top of the <a href="https://unicode-org.github.io/icu/userguide/strings/regexp.html">ICU engine</a>, whereas base R functions use either the <a href="https://github.com/laurikari/tre">TRE engine</a> or the <a href="https://www.pcre.org">PCRE engine</a>, depending on whether or not you’ve set <code>perl = TRUE</code>. Fortunately, the basics of regular expressions are so well established that you’ll encounter few variations when working with the patterns you’ll learn in this book. You only need to be aware of the difference when you start to rely on advanced features like complex Unicode character ranges or special features that use the <code>(?…)</code> syntax.</p>