Fix 2typos in arrow (#1434)

* … the choice is made for you, as *in* the data is already in a database …

delete "in"

* Parquet files are usually smaller **than** the equivalent CSV file.

add "than"
This commit is contained in:
Peter Baumgartner 2023-04-17 15:15:47 +02:00 committed by GitHub
parent 2ded03567a
commit 8f475fd50e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 2 additions and 2 deletions

View File

@ -23,7 +23,7 @@ We'll use Apache Arrow via the [arrow package](https://arrow.apache.org/docs/r/)
As an additional benefit, arrow is extremely fast: you'll see some examples later in the chapter.
Both arrow and dbplyr provide dplyr backends, so you might wonder when to use each.
In many cases, the choice is made for you, as in the data is already in a database or in parquet files, and you'll want to work with it as is.
In many cases, the choice is made for you, as the data is already in a database or in parquet files, and you'll want to work with it as is.
But if you're starting with your own data (perhaps CSV files), you can either load it into a database or convert it to parquet.
In general, it's hard to know what will work best, so in the early stages of your analysis we'd encourage you to try both and pick the one that works the best for you.
@ -127,7 +127,7 @@ The following sections will first introduce you to parquet and partitioning, and
Like CSV, parquet is used for rectangular data, but instead of being a text format that you can read with any file editor, it's a custom binary format designed specifically for the needs of big data.
This means that:
- Parquet files are usually smaller the equivalent CSV file.
- Parquet files are usually smaller than the equivalent CSV file.
Parquet relies on [efficient encodings](https://parquet.apache.org/docs/file-format/data-pages/encodings/) to keep file size down, and supports file compression.
This helps make parquet files fast because there's less data to move from disk to memory.