Use metacharacter + literal character terms
This commit is contained in:
parent
45353e0a58
commit
f4f739bccb
18
regexps.qmd
18
regexps.qmd
|
@ -57,8 +57,8 @@ str_view(fruit, "berry")
|
|||
str_view(fruit, "BERRY")
|
||||
```
|
||||
|
||||
While letter and number match exactly, punctuation characters like `.`, `+`, `*`, `[`, `]`, `?` have special meanings[^regexps-2].
|
||||
For example, `.`
|
||||
Letters and numbers match exactly and so are called **literal characters**.
|
||||
Punctuation characters like `.`, `+`, `*`, `[`, `]`, `?` have special meanings[^regexps-2] and are called **meta-characters**. For example, `.`
|
||||
will match any character[^regexps-3], so `"a."` will match any string that contains an "a" followed by another character
|
||||
:
|
||||
|
||||
|
@ -300,7 +300,7 @@ If the match fails, you can use `too_short = "debug"` to figure out what went wr
|
|||
## Pattern details
|
||||
|
||||
Now that you understand the basics of the pattern language and how it use it with some stringr and tidyr functions, its time to dig into more of the details.
|
||||
First, we'll start with **escaping**, which allows you to match characters that the pattern language otherwise treats specially.
|
||||
First, we'll start with **escaping**, which allows you to match metacharacters that would otherwise be treated specially.
|
||||
Next you'll learn about **anchors**, which allow you to match the start or end of the string.
|
||||
Then you'll more learn about **character classes** and their shortcuts, which allow you to match any character from a set.
|
||||
Next you'll learn the final details of **quantifiers**, which control how many times a pattern can match.
|
||||
|
@ -312,11 +312,11 @@ They're not always the most evocative of their purpose, but it's very helpful to
|
|||
|
||||
### Escaping {#sec-regexp-escaping}
|
||||
|
||||
In order to match a literal `.`, you need an **escape**, which tells the regular expression to ignore the special behavior and match exactly.
|
||||
In order to match a literal `.`, you need an **escape**, which tells the regular expression to match metacharacters literally.
|
||||
Like strings, regexps use the backslash for escaping, so to match a `.`, you need the regexp `\.`.
|
||||
Unfortunately this creates a problem.
|
||||
We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings.
|
||||
So, as the following example shows, to create the regular expression `\.` we need the string `"\\."`.
|
||||
So to create the regular expression `\.` we need the string `"\\."`, as the following example shows.
|
||||
|
||||
```{r}
|
||||
# To create the regular expression \., we need to use \\.
|
||||
|
@ -350,7 +350,7 @@ That lets you to avoid one layer of escaping:
|
|||
str_view(x, r"{\\}")
|
||||
```
|
||||
|
||||
The full set of characters with special meanings that need to be escaped is `.^$\|*+?{}[]()`.
|
||||
The full set of metacharacters is `.^$\|*+?{}[]()`.
|
||||
In general, look at punctuation characters with suspicion; if your regular expression isn't matching what you think it should, check if you've used any of these characters.
|
||||
|
||||
### Anchors
|
||||
|
@ -574,7 +574,7 @@ str_match(x, "gr(?:e|a)y")
|
|||
|
||||
## Pattern control
|
||||
|
||||
It's possible to exercise extra control over the details of the match by using a special pattern object instead of just a string.
|
||||
It's possible to exercise extra control over the details of the match by using a pattern object instead of just a string.
|
||||
This allows you control the so called regex flags and match various types of fixed strings, as described below.
|
||||
|
||||
### Regex flags {#sec-flags}
|
||||
|
@ -809,8 +809,8 @@ pattern <- str_c("\\b(", str_flatten(cols, "|"), ")\\b")
|
|||
str_view(sentences, pattern)
|
||||
```
|
||||
|
||||
In this example `cols` only contains numbers and letters so you don't need to worry about special characters.
|
||||
But in general, whenever you create create patterns from existing strings it's wise to run them through `str_escape()` to escape any special behavior.
|
||||
In this example `cols` only contains numbers and letters so you don't need to worry about metacharacters.
|
||||
But in general, whenever you create create patterns from existing strings it's wise to run them through `str_escape()` to ensure they match literally.
|
||||
|
||||
### Exercises
|
||||
|
||||
|
|
Loading…
Reference in New Issue