Complete column parsing
This commit is contained in:
		
							
								
								
									
										54
									
								
								import.Rmd
									
									
									
									
									
								
							
							
						
						
									
										54
									
								
								import.Rmd
									
									
									
									
									
								
							@@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2))
 | 
			
		||||
 | 
			
		||||
The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).
 | 
			
		||||
 | 
			
		||||
If you'd like to learn more, I'd recommend <http://kunststube.net/encoding/>.
 | 
			
		||||
Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
 | 
			
		||||
 | 
			
		||||
### Dates, date times, and times
 | 
			
		||||
 | 
			
		||||
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:
 | 
			
		||||
 | 
			
		||||
*   `parse_datetime()`: an 
 | 
			
		||||
    [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This
 | 
			
		||||
    is the most important date/time standard, and I recommend that you get
 | 
			
		||||
    a little familiar with it.
 | 
			
		||||
*   `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
 | 
			
		||||
    international standard in which the components of a date are
 | 
			
		||||
    organised from biggest to smallest: year, month, day, hour, minute, 
 | 
			
		||||
    second:
 | 
			
		||||
    
 | 
			
		||||
    ```{r}
 | 
			
		||||
    parse_datetime("2010-10-01T2010")
 | 
			
		||||
@@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number
 | 
			
		||||
    parse_datetime("20101010")
 | 
			
		||||
    ```
 | 
			
		||||
    
 | 
			
		||||
*   `parse_date()`: a year, optional separator, month, optional separator, 
 | 
			
		||||
    day.
 | 
			
		||||
    This is the most important date/time standard, and if you work with
 | 
			
		||||
    dates and times frequently, I recommend reading
 | 
			
		||||
    <https://en.wikipedia.org/wiki/ISO_8601>
 | 
			
		||||
    
 | 
			
		||||
*   `parse_date()` expects a year, an optional separator, a month, 
 | 
			
		||||
    an optional separator, and then a day:
 | 
			
		||||
    
 | 
			
		||||
    ```{r}
 | 
			
		||||
    parse_date("2010-10-01")
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
*   `parse_time()`: an hour, optional colon, hour, optional colon, minute,
 | 
			
		||||
    optional colon, optional seconds, optional am/pm. Base R doesn't have
 | 
			
		||||
    a great built in class for time data, so we use the one provided in the
 | 
			
		||||
    hms package.
 | 
			
		||||
*   `parse_time()` expects an hour, an optional colon, a minute, 
 | 
			
		||||
    an optional colon, optional seconds, and optional am/pm specifier:
 | 
			
		||||
  
 | 
			
		||||
    ```{r}
 | 
			
		||||
    library(hms)
 | 
			
		||||
    parse_time("20:10:01")
 | 
			
		||||
    ```
 | 
			
		||||
    
 | 
			
		||||
If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces:
 | 
			
		||||
    Base R doesn't have a great built in class for time data, so we use 
 | 
			
		||||
    the one provided in the hms package.
 | 
			
		||||
 | 
			
		||||
If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces:
 | 
			
		||||
 | 
			
		||||
Year
 | 
			
		||||
:  `%Y` (4 digits). 
 | 
			
		||||
@@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d")
 | 
			
		||||
If you're using `%b` or `%B` with non-English month names, you'll need to set the  `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
locale("fr")
 | 
			
		||||
 | 
			
		||||
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
 | 
			
		||||
```
 | 
			
		||||
,
 | 
			
		||||
 | 
			
		||||
### Exercises
 | 
			
		||||
 | 
			
		||||
1.  What are the most important options to locale?  If you live outside the
 | 
			
		||||
    US, create a new locale object that encapsulates the settings for the
 | 
			
		||||
    data files you read most commonly.
 | 
			
		||||
1.  What are the most important arguments to `locale()`?  If you live
 | 
			
		||||
    outside the US, create a new locale object that encapsulates the
 | 
			
		||||
    settings for the types of file you read most commonly.
 | 
			
		||||
    
 | 
			
		||||
1.  What's the difference between `read_csv()` and `read_csv2()`?
 | 
			
		||||
    
 | 
			
		||||
1.  I didn't discuss the `date_format` and `time_format` options to
 | 
			
		||||
    `locale()`. What do they do? Construct an example that shows when they
 | 
			
		||||
@@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
 | 
			
		||||
1.  What are the most common encodings used in Europe? What are the
 | 
			
		||||
    most common encodings used in Asia?
 | 
			
		||||
 | 
			
		||||
1.  Generate the correct format string to parse each of the following 
 | 
			
		||||
    dates and times:
 | 
			
		||||
    
 | 
			
		||||
    ```{r}
 | 
			
		||||
    d1 <- "January 1, 2010"
 | 
			
		||||
    d2 <- "2015-Mar-07"
 | 
			
		||||
    d3 <- "06-Jun-2017"
 | 
			
		||||
    d4 <- "August 19 (2015)"
 | 
			
		||||
    d5 <- "12/30/14" # Dec 12, 2014
 | 
			
		||||
    t1 <- "1705"
 | 
			
		||||
    t2 <- "11:15:10.12 PM"
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
## Parsing a file
 | 
			
		||||
 | 
			
		||||
Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section:
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user