Hammering out intent of regexps chapter

2022-01-08 13:59:59 -06:00
parent 3c97cfed3f
commit e1375dfb18
2 changed files with 109 additions and 193 deletions
--- a/strings.Rmd
+++ b/strings.Rmd
@@ -243,6 +243,9 @@ df %>%

 Before we can discuss the opposite problem of extracting data out of strings, we need to take a quick digression to talk about **regular expressions**.
 Regular expressions are a very concise language for describing patterns in strings.
+Regular expressions can be overwhelming at first, and you'll think a cat walked across your keyboard.
+Fortunately, as your understanding improves they'll soon start to make sense.
+
 We'll start by using `str_detect()` which answers a simple question: "does this pattern occur anywhere in my vector?".
 We'll then ask progressively more complex questions by learning more about regular expressions and the functions that use them.

@@ -607,3 +610,28 @@ The are a bunch of other places you can use regular expressions outside of strin
    ```

    (If you're more comfortable with "globs" like `*.Rmd`, you can convert them to regular expressions with `glob2rx()`):
+
+## Strategies
+
+Don't forget that you're in a programming language and you have other tools at your disposal.
+Instead of creating one complex regular expression, it's often easier to write a series of simpler regexps.
+If you get stuck trying to create a single regexp that solves your problem, take a step back and think if you could break the problem down into smaller pieces, solving each challenge before moving onto the next one.
+
+### Using multiple regular expressions
+
+When you have complex logical conditions (e.g. match `a` or `b` but not `c` unless `d`) it's often easier to combine multiple `str_detect()` calls with logical operators instead of trying to create a single regular expression.
+For example, here are two ways to find all words that don't contain any vowels:
+
+```{r}
+# Find all words containing at least one vowel, and negate
+no_vowels_1 <- !str_detect(words, "[aeiou]")
+# Find all words consisting only of consonants (non-vowels)
+no_vowels_2 <- str_detect(words, "^[^aeiou]+$")
+identical(no_vowels_1, no_vowels_2)
+```
+
+The results are identical, but I think the first approach is significantly easier to understand.
+If your regular expression gets overly complicated, try breaking it up into smaller pieces, giving each piece a name, and then combining the pieces with logical operations.
+
+### Repeated `str_replace()`
+