Merge branch 'master' of github.com:hadley/r4ds
This commit is contained in:
		
							
								
								
									
										14
									
								
								import.Rmd
									
									
									
									
									
								
							
							
						
						
									
										14
									
								
								import.Rmd
									
									
									
									
									
								
							@@ -18,7 +18,7 @@ library(readr)
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
Most of readr's functions are concerned with turning flat files into data frames:
 | 
					Most of readr's functions are concerned with turning flat files into data frames:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* `read_csv()` reads comma delimited files, `read_csv2()` reads semi-colon
 | 
					* `read_csv()` reads comma delimited files, `read_csv2()` reads semicolon
 | 
				
			||||||
  separated files (common in countries where `,` is used as the decimal place),
 | 
					  separated files (common in countries where `,` is used as the decimal place),
 | 
				
			||||||
  `read_tsv()` reads tab delimited files, and `read_delim()` reads in files
 | 
					  `read_tsv()` reads tab delimited files, and `read_delim()` reads in files
 | 
				
			||||||
  with any delimiter.
 | 
					  with any delimiter.
 | 
				
			||||||
@@ -108,7 +108,7 @@ If you've used R before, you might wonder why we're not using `read.csv()`. Ther
 | 
				
			|||||||
  your operating system and environment variables, so import code that works 
 | 
					  your operating system and environment variables, so import code that works 
 | 
				
			||||||
  on your computer might not work on someone else's.
 | 
					  on your computer might not work on someone else's.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Exericses
 | 
					### Exercises
 | 
				
			||||||
 | 
					
 | 
				
			||||||
1.  What function would you use to read a file where fields were separated with  
 | 
					1.  What function would you use to read a file where fields were separated with  
 | 
				
			||||||
    "|"?
 | 
					    "|"?
 | 
				
			||||||
@@ -281,7 +281,7 @@ Encodings are a rich and complex topic, and I've only scratched the surface here
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight):
 | 
					You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight):
 | 
				
			||||||
 | 
					
 | 
				
			||||||
*   `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
 | 
					*   `parse_datetime()` expects an ISO8601 date time. ISO8601 is an
 | 
				
			||||||
    international standard in which the components of a date are
 | 
					    international standard in which the components of a date are
 | 
				
			||||||
    organised from biggest to smallest: year, month, day, hour, minute, 
 | 
					    organised from biggest to smallest: year, month, day, hour, minute, 
 | 
				
			||||||
    second.
 | 
					    second.
 | 
				
			||||||
@@ -427,7 +427,7 @@ These defaults don't always work for larger files. There are two basic problems:
 | 
				
			|||||||
    a column of doubles that only contains integers in the first 1000 rows. 
 | 
					    a column of doubles that only contains integers in the first 1000 rows. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
1.  The column might contain a lot of missing values. If the first 1000
 | 
					1.  The column might contain a lot of missing values. If the first 1000
 | 
				
			||||||
    rows contains on `NA`s, readr will guess that it's a character 
 | 
					    rows contains only `NA`s, readr will guess that it's a character 
 | 
				
			||||||
    vector, whereas you probably want to parse it as something more
 | 
					    vector, whereas you probably want to parse it as something more
 | 
				
			||||||
    specific.
 | 
					    specific.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@@ -439,7 +439,7 @@ challenge <- read_csv(readr_example("challenge.csv"))
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
(Note the use of `readr_example()` which finds the path to one of the files included with the package)
 | 
					(Note the use of `readr_example()` which finds the path to one of the files included with the package)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
There are two outputs: the column specification generated by looking at the first 1000 rows, and the first five parsing failures. It's always a good idea to explicitly pull out the `problems()` so you can explore them in more depth:
 | 
					There are two outputs: the column specification generated by looking at the first 1000 rows, and the first five parsing failures. It's always a good idea to explicitly pull out the `problems()`, so you can explore them in more depth:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```{r}
 | 
					```{r}
 | 
				
			||||||
problems(challenge)
 | 
					problems(challenge)
 | 
				
			||||||
@@ -543,7 +543,7 @@ There are a few other general strategies to help you parse files:
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
readr also comes with two useful functions for writing data back to disk: `write_csv()` and `write_tsv()`. They:
 | 
					readr also comes with two useful functions for writing data back to disk: `write_csv()` and `write_tsv()`. They:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Are faster than the base R equvalents.
 | 
					* Are faster than the base R equivalents.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Never write rownames, and quote only when needed. 
 | 
					* Never write rownames, and quote only when needed. 
 | 
				
			||||||
 | 
					
 | 
				
			||||||
@@ -610,7 +610,7 @@ file.remove("challenge.rds")
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
To get other types of data into R, we recommend starting with the tidyverse packages listed below. They're certainly not perfect, but they are a good place to start.
 | 
					To get other types of data into R, we recommend starting with the tidyverse packages listed below. They're certainly not perfect, but they are a good place to start.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For rectanuglar data:
 | 
					For rectangular data:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* haven reads SPSS, Stata, and SAS files.
 | 
					* haven reads SPSS, Stata, and SAS files.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
@@ -383,7 +383,7 @@ dplyr                        | SQL
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
Note that "INNER" and "OUTER" are optional, and often omitted.
 | 
					Note that "INNER" and "OUTER" are optional, and often omitted.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Joining different variables between the tables, e.g. `inner_join(x, y, by = c("a" = "b"))` uses a slightly different syntax in SQL: `SELECT * FROM x INNER JOIN y ON x.a = y.b`. As this syntax suggests SQL supports a wide range of join types than dplyr because you can connect the tables using constraints other than equality (sometimes called non-equijoins).
 | 
					Joining different variables between the tables, e.g. `inner_join(x, y, by = c("a" = "b"))` uses a slightly different syntax in SQL: `SELECT * FROM x INNER JOIN y ON x.a = y.b`. As this syntax suggests SQL supports a wider  range of join types than dplyr because you can connect the tables using constraints other than equality (sometimes called non-equijoins).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Filtering joins {#filtering-joins}
 | 
					## Filtering joins {#filtering-joins}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user