parent
e8a9f8da6b
commit
5ea91ca7ef
16
import.Rmd
16
import.Rmd
|
@ -190,6 +190,9 @@ Using parsers is mostly a matter of understanding what's available and how they
|
||||||
1. `parse_character()` seems so simple that it shouldn't be necessary. But
|
1. `parse_character()` seems so simple that it shouldn't be necessary. But
|
||||||
one complication makes it quite important: character encodings.
|
one complication makes it quite important: character encodings.
|
||||||
|
|
||||||
|
1. `parse_factor()` create factors, the data structure that R uses to represent
|
||||||
|
categorical variables with fixed and known values.
|
||||||
|
|
||||||
1. `parse_datetime()`, `parse_date()`, and `parse_time()` allow you to
|
1. `parse_datetime()`, `parse_date()`, and `parse_time()` allow you to
|
||||||
parse various date & time specifications. These are the most complicated
|
parse various date & time specifications. These are the most complicated
|
||||||
because there are so many different ways of writing dates.
|
because there are so many different ways of writing dates.
|
||||||
|
@ -240,7 +243,7 @@ parse_number("123.456.789", locale = locale(grouping_mark = "."))
|
||||||
parse_number("123'456'789", locale = locale(grouping_mark = "'"))
|
parse_number("123'456'789", locale = locale(grouping_mark = "'"))
|
||||||
```
|
```
|
||||||
|
|
||||||
### Character
|
### Strings {#readr-strings}
|
||||||
|
|
||||||
It seems like `parse_character()` should be really simple --- it could just return its input. Unfortunately life isn't so simple, as there are multiple ways to represent the same string. To understand what's going on, we need to dive into the details of how computers represent strings. In R, we can get at the underlying representation of a string using `charToRaw()`:
|
It seems like `parse_character()` should be really simple --- it could just return its input. Unfortunately life isn't so simple, as there are multiple ways to represent the same string. To understand what's going on, we need to dive into the details of how computers represent strings. In R, we can get at the underlying representation of a string using `charToRaw()`:
|
||||||
|
|
||||||
|
@ -280,6 +283,17 @@ The first argument to `guess_encoding()` can either be a path to a file, or, as
|
||||||
|
|
||||||
Encodings are a rich and complex topic, and I've only scratched the surface here. If you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
|
Encodings are a rich and complex topic, and I've only scratched the surface here. If you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
|
||||||
|
|
||||||
|
### Factors {#readr-factors}
|
||||||
|
|
||||||
|
R uses factors to represent categorical variables that have a known set of possible values. Given `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
fruit <- c("apple", "banana")
|
||||||
|
parse_factor(c("apple", "banana", "bananana"), levels = fruit)
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have problematic entries, it's often easier to read in as strings and then use the tools you'll learn about in [strings] and [factors] to clean them up.
|
||||||
|
|
||||||
### Dates, date-times, and times {#readr-datetimes}
|
### Dates, date-times, and times {#readr-datetimes}
|
||||||
|
|
||||||
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date-time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight). When called without any additional arguments:
|
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date-time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight). When called without any additional arguments:
|
||||||
|
|
Loading…
Reference in New Issue