Work around encoding issue (#1465)

This commit is contained in:
Hadley Wickham 2023-05-10 15:05:13 -05:00 committed by GitHub
parent 057966a4a9
commit 870e706026
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 10 additions and 6 deletions

View File

@ -550,22 +550,26 @@ For example here are two inline CSVs with unusual encodings[^strings-7]:
[^strings-7]: Here I'm using the special `\x` to encode binary data directly into a string.
```{r}
#| message: false
#| eval: false
x1 <- "text\nEl Ni\xf1o was particularly bad this year"
read_csv(x1)
read_csv(x1)$text
#> [1] "El Ni\xf1o was particularly bad this year"
x2 <- "text\n\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd"
read_csv(x2)
read_csv(x2)$text
#> [1] "\x82\xb1\x82\xf1\x82ɂ\xbf\x82\xcd"
```
To read these correctly, you specify the encoding via the `locale` argument:
```{r}
#| message: false
read_csv(x1, locale = locale(encoding = "Latin1"))
#| eval: false
read_csv(x1, locale = locale(encoding = "Latin1"))$text
#> [1] "El Niño was particularly bad this year"
read_csv(x2, locale = locale(encoding = "Shift-JIS"))
read_csv(x2, locale = locale(encoding = "Shift-JIS"))$text
#> [1] "こんにちは"
```
How do you find the correct encoding?