More string stuff
This commit is contained in:
94
strings.Rmd
94
strings.Rmd
@@ -376,6 +376,33 @@ There are three ways we could fix this:
|
||||
This is pretty typical when working with strings --- there are often multiple ways to reach your goal, either making your pattern more complicated or by doing some preprocessing on your string.
|
||||
If you get stuck trying one approach, it can often be useful to switch gears and tackle the problem from a different perspective.
|
||||
|
||||
### Replace matches
|
||||
|
||||
Sometimes there are inconsistencies in the formatting that are easier to fix before you start extracting; easier to make the data more regular and check your work than coming up with a more complicated regular expression in `str_*` and friends.
|
||||
|
||||
`str_replace_all()` allow you to replace matches with new strings.
|
||||
The simplest use is to replace a pattern with a fixed string:
|
||||
|
||||
```{r}
|
||||
x <- c("apple", "pear", "banana")
|
||||
str_replace_all(x, "[aeiou]", "-")
|
||||
```
|
||||
|
||||
With `str_replace_all()` you can perform multiple replacements by supplying a named vector.
|
||||
The name gives a regular expression to match, and the value gives the replacement.
|
||||
|
||||
```{r}
|
||||
x <- c("1 house", "1 person has 2 cars", "3 people")
|
||||
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
|
||||
```
|
||||
|
||||
`str_remove_all()` is a short cut for `str_replace_all(x, pattern, "")` --- it removes matching patterns from a string.
|
||||
|
||||
Use in `mutate()`
|
||||
|
||||
Using pipe inside mutate.
|
||||
Recommendation to make a function, and think about testing it --- don't need formal tests, but useful to build up a set of positive and negative test cases as you.
|
||||
|
||||
### Pattern control
|
||||
|
||||
Now that you've learn about regular expressions, you might be worried about them working when you don't want them to.
|
||||
@@ -420,33 +447,6 @@ In this section you'll learn how to use various functions tidyr to extract them.
|
||||
|
||||
Waiting on: <https://github.com/tidyverse/tidyups/pull/15>
|
||||
|
||||
### Replace matches
|
||||
|
||||
Sometimes there are inconsistencies in the formatting that are easier to fix before you start extracting; easier to make the data more regular and check your work than coming up with a more complicated regular expression in `str_*` and friends.
|
||||
|
||||
`str_replace_all()` allow you to replace matches with new strings.
|
||||
The simplest use is to replace a pattern with a fixed string:
|
||||
|
||||
```{r}
|
||||
x <- c("apple", "pear", "banana")
|
||||
str_replace_all(x, "[aeiou]", "-")
|
||||
```
|
||||
|
||||
With `str_replace_all()` you can perform multiple replacements by supplying a named vector.
|
||||
The name gives a regular expression to match, and the value gives the replacement.
|
||||
|
||||
```{r}
|
||||
x <- c("1 house", "1 person has 2 cars", "3 people")
|
||||
str_replace_all(x, c("1" = "one", "2" = "two", "3" = "three"))
|
||||
```
|
||||
|
||||
`str_remove_all()` is a short cut for `str_replace_all(x, pattern, "")` --- it removes matching patterns from a string.
|
||||
|
||||
Use in `mutate()`
|
||||
|
||||
Using pipe inside mutate.
|
||||
Recommendation to make a function, and think about testing it --- don't need formal tests, but useful to build up a set of positive and negative test cases as you.
|
||||
|
||||
## Locale dependent operations {#other-languages}
|
||||
|
||||
So far all of our examples have been using English.
|
||||
@@ -499,9 +499,9 @@ Fortunately there are three sets of functions where the locale matters:
|
||||
|
||||
[^strings-8]: Sorting in languages that don't have an alphabet (like Chinese) is more complicated still.
|
||||
|
||||
## Handy functions
|
||||
## Letters
|
||||
|
||||
Before we study three useful families of string functions, I want to
|
||||
Functions that work with the letters inside of the string.
|
||||
|
||||
### Length
|
||||
|
||||
@@ -527,24 +527,6 @@ babynames %>%
|
||||
count(name, wt = n, sort = TRUE)
|
||||
```
|
||||
|
||||
### Long strings
|
||||
|
||||
Sometimes the reason you care about the length of a string is because you're trying to fit it into a label on a plot or in a table.
|
||||
stringr provides two useful tools for cases where your string is too long:
|
||||
|
||||
- `str_trunc(x, 20)` ensures that no string is longer than 20 characters, replacing any thing too long with `…`.
|
||||
|
||||
- `str_wrap(x, 20)` wraps a string introducing new lines so that each line is at most 20 characters (it doesn't hyphenate, however, so any word longer than 20 characters will make a longer time)
|
||||
|
||||
```{r}
|
||||
x <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
|
||||
|
||||
str_trunc(x, 30)
|
||||
str_view(str_wrap(x, 30))
|
||||
```
|
||||
|
||||
TODO: add example with a plot.
|
||||
|
||||
### Subsetting
|
||||
|
||||
You can extract parts of a string using `str_sub(string, start, end)`.
|
||||
@@ -577,6 +559,24 @@ babynames %>%
|
||||
)
|
||||
```
|
||||
|
||||
### Long strings
|
||||
|
||||
Sometimes the reason you care about the length of a string is because you're trying to fit it into a label on a plot or in a table.
|
||||
stringr provides two useful tools for cases where your string is too long:
|
||||
|
||||
- `str_trunc(x, 20)` ensures that no string is longer than 20 characters, replacing any thing too long with `…`.
|
||||
|
||||
- `str_wrap(x, 20)` wraps a string introducing new lines so that each line is at most 20 characters (it doesn't hyphenate, however, so any word longer than 20 characters will make a longer time)
|
||||
|
||||
```{r}
|
||||
x <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
|
||||
|
||||
str_trunc(x, 30)
|
||||
str_view(str_wrap(x, 30))
|
||||
```
|
||||
|
||||
TODO: add example with a plot.
|
||||
|
||||
### Exercises
|
||||
|
||||
1. Use `str_length()` and `str_sub()` to extract the middle letter from each baby name. What will you do if the string has an even number of characters?
|
||||
|
||||
Reference in New Issue
Block a user