commit
31e7199d63
22
import.Rmd
22
import.Rmd
|
@ -45,19 +45,19 @@ There are many ways to read flat files into R. If you've be using R for a while,
|
|||
sometimes need to supply a few more arguments when using them the first
|
||||
time, but they'll definitely work on other peoples computers. The base R
|
||||
functions take a number of settings from system defaults, which means that
|
||||
code that works on your computer might not work on someone elses.
|
||||
code that works on your computer might not work on someone else's.
|
||||
|
||||
Make sure you have the readr package (`install.packages("readr")`).
|
||||
|
||||
Most of readr's functions are concerned with turning flat files into data frames:
|
||||
|
||||
* `read_csv()` read comma delimited files, `read_csv2()` reads semi-colon
|
||||
* `read_csv()` reads comma delimited files, `read_csv2()` reads semi-colon
|
||||
separated files (common in countries where `,` is used as the decimal place),
|
||||
`read_tsv()` reads tab delimited files, and `read_delim()` reads in files
|
||||
with a user supplied delimiter.
|
||||
|
||||
* `read_fwf()` reads fixed width files. You can specify fields either by their
|
||||
widths with `fwf_widths()` or theirs position with `fwf_positions()`.
|
||||
widths with `fwf_widths()` or their position with `fwf_positions()`.
|
||||
`read_table()` reads a common variation of fixed width files where columns
|
||||
are separated by white space.
|
||||
|
||||
|
@ -73,7 +73,7 @@ readr also provides a number of functions for reading files off disk into simple
|
|||
|
||||
These might be useful for other programming tasks.
|
||||
|
||||
As well as reading data frame disk, readr also provides tools for working with data frames and character vectors in R:
|
||||
As well as reading data from disk, readr also provides tools for working with data frames and character vectors in R:
|
||||
|
||||
* `type_convert()` applies the same parsing heuristics to the character columns
|
||||
in a data frame. You can override its choices using `col_types`.
|
||||
|
@ -94,7 +94,7 @@ The first two arguments of `read_csv()` are:
|
|||
* `TRUE` (the default), which reads column names from the first row
|
||||
of the file
|
||||
|
||||
* `FALSE` number columns sequentially from `X1` to `Xn`.
|
||||
* `FALSE` numbers columns sequentially from `X1` to `Xn`.
|
||||
|
||||
* A character vector, used as column names. If these don't match up
|
||||
with the columns in the data, you'll get a warning message.
|
||||
|
@ -109,7 +109,7 @@ EXAMPLE
|
|||
|
||||
Typically, you'll see a lot of warnings if readr has guessed the column type incorrectly. This most often occurs when the first 1000 rows are different to the rest of the data. Perhaps there are a lot of missing data there, or maybe your data is mostly numeric but a few rows have characters. Fortunately, it's easy to fix these problems using the `col_type` argument.
|
||||
|
||||
(Note that if you have a very large file, you might want to set `n_max` to 10,000 or 100,000. That will speed up iteration while you're finding common problems)
|
||||
(Note that if you have a very large file, you might want to set `n_max` to 10,000 or 100,000. That will speed up iterations while you're finding common problems)
|
||||
|
||||
Specifying the `col_type` looks like this:
|
||||
|
||||
|
@ -128,7 +128,7 @@ You can use the following types of columns
|
|||
|
||||
* `col_number()` (n) is a more flexible parsed for numbers embedded in other
|
||||
strings. It will look for the first number in a string, ignoring non-numeric
|
||||
prefixes and suffixes. It will also ignoring the grouping mark specified by
|
||||
prefixes and suffixes. It will also ignore the grouping mark specified by
|
||||
the locale (see below for more details).
|
||||
|
||||
* `col_factor()` (f) allows you to load data directly into a factor if you know
|
||||
|
@ -139,7 +139,7 @@ You can use the following types of columns
|
|||
* `col_date()` (D), `col_datetime()` (T) and `col_time()` (t) parse into dates,
|
||||
date times, and times as described below.
|
||||
|
||||
You might have noticed that each column parser has a one letter abbreviation, which you can instead of the full function call (assuming you're happy with the default arguments):
|
||||
You might have noticed that each column parser has a one letter abbreviation, which you can use instead of the full function call (assuming you're happy with the default arguments):
|
||||
|
||||
```{r, eval = FALSE}
|
||||
read_csv("mypath.csv", col_types = cols(
|
||||
|
@ -203,7 +203,7 @@ If these defaults don't work for your data you can supply your own date time for
|
|||
|
||||
* AM/PM indicator: `%p`.
|
||||
|
||||
* Non-digits: `%.` skips one non-digit charcter, `%*` skips any number of
|
||||
* Non-digits: `%.` skips one non-digit character, `%*` skips any number of
|
||||
non-digits.
|
||||
|
||||
The best way to figure out the correct string is to create a few examples in a character vector, and test with one of the parsing functions. For example:
|
||||
|
@ -360,11 +360,11 @@ There are three key differences between tbl_dfs and data.frames:
|
|||
|
||||
You can control the default appearance with options:
|
||||
|
||||
* `options(dplyr.print_max = n, dplyr.print_min = m)`: if more than `n`
|
||||
* `options(dplyr.print_max = n, dplyr.print_min = m)`: if more than `m`
|
||||
rows print `m` rows. Use `options(dplyr.print_max = Inf)` to always
|
||||
show all rows.
|
||||
|
||||
* `options(dply.width = Inf)` will always print all columns, regardless
|
||||
* `options(dplyr.width = Inf)` will always print all columns, regardless
|
||||
of the width of the screen.
|
||||
|
||||
|
||||
|
|
Loading…
Reference in New Issue