Complete column parsing
This commit is contained in:
parent
785018073b
commit
9d62e1c23e
54
import.Rmd
54
import.Rmd
|
@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2))
|
|||
|
||||
The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).
|
||||
|
||||
If you'd like to learn more, I'd recommend <http://kunststube.net/encoding/>.
|
||||
Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
|
||||
|
||||
### Dates, date times, and times
|
||||
|
||||
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:
|
||||
|
||||
* `parse_datetime()`: an
|
||||
[ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This
|
||||
is the most important date/time standard, and I recommend that you get
|
||||
a little familiar with it.
|
||||
* `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
|
||||
international standard in which the components of a date are
|
||||
organised from biggest to smallest: year, month, day, hour, minute,
|
||||
second:
|
||||
|
||||
```{r}
|
||||
parse_datetime("2010-10-01T2010")
|
||||
|
@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number
|
|||
parse_datetime("20101010")
|
||||
```
|
||||
|
||||
* `parse_date()`: a year, optional separator, month, optional separator,
|
||||
day.
|
||||
This is the most important date/time standard, and if you work with
|
||||
dates and times frequently, I recommend reading
|
||||
<https://en.wikipedia.org/wiki/ISO_8601>
|
||||
|
||||
* `parse_date()` expects a year, an optional separator, a month,
|
||||
an optional separator, and then a day:
|
||||
|
||||
```{r}
|
||||
parse_date("2010-10-01")
|
||||
```
|
||||
|
||||
* `parse_time()`: an hour, optional colon, hour, optional colon, minute,
|
||||
optional colon, optional seconds, optional am/pm. Base R doesn't have
|
||||
a great built in class for time data, so we use the one provided in the
|
||||
hms package.
|
||||
* `parse_time()` expects an hour, an optional colon, a minute,
|
||||
an optional colon, optional seconds, and optional am/pm specifier:
|
||||
|
||||
```{r}
|
||||
library(hms)
|
||||
parse_time("20:10:01")
|
||||
```
|
||||
|
||||
Base R doesn't have a great built in class for time data, so we use
|
||||
the one provided in the hms package.
|
||||
|
||||
If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces:
|
||||
If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces:
|
||||
|
||||
Year
|
||||
: `%Y` (4 digits).
|
||||
|
@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d")
|
|||
If you're using `%b` or `%B` with non-English month names, you'll need to set the `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.
|
||||
|
||||
```{r}
|
||||
locale("fr")
|
||||
|
||||
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
|
||||
```
|
||||
,
|
||||
|
||||
### Exercises
|
||||
|
||||
1. What are the most important options to locale? If you live outside the
|
||||
US, create a new locale object that encapsulates the settings for the
|
||||
data files you read most commonly.
|
||||
1. What are the most important arguments to `locale()`? If you live
|
||||
outside the US, create a new locale object that encapsulates the
|
||||
settings for the types of file you read most commonly.
|
||||
|
||||
1. What's the difference between `read_csv()` and `read_csv2()`?
|
||||
|
||||
1. I didn't discuss the `date_format` and `time_format` options to
|
||||
`locale()`. What do they do? Construct an example that shows when they
|
||||
|
@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
|
|||
1. What are the most common encodings used in Europe? What are the
|
||||
most common encodings used in Asia?
|
||||
|
||||
1. Generate the correct format string to parse each of the following
|
||||
dates and times:
|
||||
|
||||
```{r}
|
||||
d1 <- "January 1, 2010"
|
||||
d2 <- "2015-Mar-07"
|
||||
d3 <- "06-Jun-2017"
|
||||
d4 <- "August 19 (2015)"
|
||||
d5 <- "12/30/14" # Dec 12, 2014
|
||||
t1 <- "1705"
|
||||
t2 <- "11:15:10.12 PM"
|
||||
```
|
||||
|
||||
## Parsing a file
|
||||
|
||||
Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section:
|
||||
|
|
Loading…
Reference in New Issue