Complete column parsing
This commit is contained in:
parent
785018073b
commit
9d62e1c23e
54
import.Rmd
54
import.Rmd
|
@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2))
|
||||||
|
|
||||||
The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).
|
The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).
|
||||||
|
|
||||||
If you'd like to learn more, I'd recommend <http://kunststube.net/encoding/>.
|
Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
|
||||||
|
|
||||||
### Dates, date times, and times
|
### Dates, date times, and times
|
||||||
|
|
||||||
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:
|
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:
|
||||||
|
|
||||||
* `parse_datetime()`: an
|
* `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
|
||||||
[ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This
|
international standard in which the components of a date are
|
||||||
is the most important date/time standard, and I recommend that you get
|
organised from biggest to smallest: year, month, day, hour, minute,
|
||||||
a little familiar with it.
|
second:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
parse_datetime("2010-10-01T2010")
|
parse_datetime("2010-10-01T2010")
|
||||||
|
@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number
|
||||||
parse_datetime("20101010")
|
parse_datetime("20101010")
|
||||||
```
|
```
|
||||||
|
|
||||||
* `parse_date()`: a year, optional separator, month, optional separator,
|
This is the most important date/time standard, and if you work with
|
||||||
day.
|
dates and times frequently, I recommend reading
|
||||||
|
<https://en.wikipedia.org/wiki/ISO_8601>
|
||||||
|
|
||||||
|
* `parse_date()` expects a year, an optional separator, a month,
|
||||||
|
an optional separator, and then a day:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
parse_date("2010-10-01")
|
parse_date("2010-10-01")
|
||||||
```
|
```
|
||||||
|
|
||||||
* `parse_time()`: an hour, optional colon, hour, optional colon, minute,
|
* `parse_time()` expects an hour, an optional colon, a minute,
|
||||||
optional colon, optional seconds, optional am/pm. Base R doesn't have
|
an optional colon, optional seconds, and optional am/pm specifier:
|
||||||
a great built in class for time data, so we use the one provided in the
|
|
||||||
hms package.
|
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
library(hms)
|
library(hms)
|
||||||
parse_time("20:10:01")
|
parse_time("20:10:01")
|
||||||
```
|
```
|
||||||
|
|
||||||
If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces:
|
Base R doesn't have a great built in class for time data, so we use
|
||||||
|
the one provided in the hms package.
|
||||||
|
|
||||||
|
If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces:
|
||||||
|
|
||||||
Year
|
Year
|
||||||
: `%Y` (4 digits).
|
: `%Y` (4 digits).
|
||||||
|
@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d")
|
||||||
If you're using `%b` or `%B` with non-English month names, you'll need to set the `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.
|
If you're using `%b` or `%B` with non-English month names, you'll need to set the `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
locale("fr")
|
|
||||||
|
|
||||||
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
|
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
|
||||||
```
|
```
|
||||||
,
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. What are the most important options to locale? If you live outside the
|
1. What are the most important arguments to `locale()`? If you live
|
||||||
US, create a new locale object that encapsulates the settings for the
|
outside the US, create a new locale object that encapsulates the
|
||||||
data files you read most commonly.
|
settings for the types of file you read most commonly.
|
||||||
|
|
||||||
|
1. What's the difference between `read_csv()` and `read_csv2()`?
|
||||||
|
|
||||||
1. I didn't discuss the `date_format` and `time_format` options to
|
1. I didn't discuss the `date_format` and `time_format` options to
|
||||||
`locale()`. What do they do? Construct an example that shows when they
|
`locale()`. What do they do? Construct an example that shows when they
|
||||||
|
@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
|
||||||
1. What are the most common encodings used in Europe? What are the
|
1. What are the most common encodings used in Europe? What are the
|
||||||
most common encodings used in Asia?
|
most common encodings used in Asia?
|
||||||
|
|
||||||
|
1. Generate the correct format string to parse each of the following
|
||||||
|
dates and times:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
d1 <- "January 1, 2010"
|
||||||
|
d2 <- "2015-Mar-07"
|
||||||
|
d3 <- "06-Jun-2017"
|
||||||
|
d4 <- "August 19 (2015)"
|
||||||
|
d5 <- "12/30/14" # Dec 12, 2014
|
||||||
|
t1 <- "1705"
|
||||||
|
t2 <- "11:15:10.12 PM"
|
||||||
|
```
|
||||||
|
|
||||||
## Parsing a file
|
## Parsing a file
|
||||||
|
|
||||||
Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section:
|
Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section:
|
||||||
|
|
Loading…
Reference in New Issue