Try simpler code with latest arrow (#1334)
This commit is contained in:
parent
c6edfb977e
commit
810b9f6a3c
File diff suppressed because one or more lines are too long
|
@ -75,18 +75,12 @@ A good rule of thumb is that you usually want at least twice as much memory as t
|
|||
This means we want to avoid `read_csv()` and instead use the `arrow::open_dataset()`:
|
||||
|
||||
```{r open-dataset}
|
||||
# partial schema for ISBN column only
|
||||
opts <- CsvConvertOptions$create(col_types = schema(ISBN = string()))
|
||||
|
||||
seattle_csv <- open_dataset(
|
||||
sources = "data/seattle-library-checkouts.csv",
|
||||
format = "csv",
|
||||
convert_options = opts
|
||||
format = "csv"
|
||||
)
|
||||
```
|
||||
|
||||
(Here we've had to use some relatively advanced code to parse the ISBN variable correctly: this is because the first \~83,000 rows don't contain any data so arrow guesses the wrong types. The arrow team is aware of this problem and there will hopefully be a better approach by the time you read this chapter.)
|
||||
|
||||
What happens when this code is run?
|
||||
`open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
|
||||
Then it records what it's found and stops; it will only read further rows as you specifically request them.
|
||||
|
|
Loading…
Reference in New Issue