Minor edits
This commit is contained in:
parent
eb61248d8c
commit
4f32e9afcc
|
@ -74,7 +74,7 @@ read_csv("a,b,c
|
|||
```
|
||||
|
||||
In both cases `read_csv()` uses the first line of the data for the column names, which is a very common convention.
|
||||
There are two cases where you might want to tweak this behaviour:
|
||||
There are two cases where you might want to tweak this behavior:
|
||||
|
||||
1. Sometimes there are a few lines of metadata at the top of the file.
|
||||
You can use `skip = n` to skip the first `n` lines; or use `comment = "#"` to drop all lines that start with (e.g.) `#`.
|
||||
|
@ -118,7 +118,7 @@ To read in more challenging files, you'll need to learn more about how readr par
|
|||
### First steps
|
||||
|
||||
Let's take another look at the `students` data.
|
||||
In the `favourite.food` column, there are a bunch of foot items and then the character string `N/A`, which should have been an real `NA` that R will recognize as "not available".
|
||||
In the `favourite.food` column, there are a bunch of food items and then the character string `N/A`, which should have been an real `NA` that R will recognize as "not available".
|
||||
This is something we can address using the `na` argument.
|
||||
|
||||
```{r message = FALSE}
|
||||
|
@ -127,7 +127,7 @@ students <- read_csv("data/students.csv", na = c("N/A", ""))
|
|||
students
|
||||
```
|
||||
|
||||
Once you read data in, the first step is usually involve transforming it in some way to make it easier to work with in the rest of your analysis.
|
||||
Once you read data in, the first step usually involves transforming it in some way to make it easier to work with in the rest of your analysis.
|
||||
For example, the column names in the `students` file we read in are formatted in non-standard ways.
|
||||
You might consider renaming them one by one with `dplyr::rename()` or you might use the `janitor::clean_names()` function turn them all into snake case at once.[^data-import-1]
|
||||
This function takes in a data frame and returns a data frame with variable names converted to snake case.
|
||||
|
@ -140,7 +140,7 @@ students |>
|
|||
clean_names()
|
||||
```
|
||||
|
||||
Another common task after reading in data is to consider the variable types.
|
||||
Another common task after reading in data is to consider variable types.
|
||||
For example, `meal_type` is a categorical variable with a known set of possible values.
|
||||
In R, factors can be used to work with categorical variables.
|
||||
We can convert this variable to a factor using the `factor()` function.
|
||||
|
@ -162,23 +162,22 @@ We discuss the details of fixing this issue in Chapter \@ref(import-spreadsheets
|
|||
### Compared to base R
|
||||
|
||||
If you've used R before, you might wonder why we're not using `read.csv()`.
|
||||
There are a few good reasons to favour readr functions over the base equivalents:
|
||||
There are a few good reasons to favor readr functions over the base equivalents:
|
||||
|
||||
- They are typically much faster (\~10x) than their base equivalents.
|
||||
Long running jobs have a progress bar, so you can see what's happening.
|
||||
If you're looking for raw speed, try `data.table::fread()`.
|
||||
It doesn't fit quite so well into the tidyverse, but it can be quite a bit faster.
|
||||
|
||||
- They produce tibbles, they don't convert character vectors to factors, use row names, or munge the column names.
|
||||
- They produce tibbles, and they don't use row names or munge the column names.
|
||||
These are common sources of frustration with the base R functions.
|
||||
|
||||
- They are more reproducible.
|
||||
Base R functions inherit some behaviour from your operating system and environment variables, so import code that works on your computer might not work on someone else's.
|
||||
Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else's.
|
||||
|
||||
### Exercises
|
||||
|
||||
1. What function would you use to read a file where fields were separated with\
|
||||
"\|"?
|
||||
1. What function would you use to read a file where fields were separated with "\|"?
|
||||
|
||||
2. Apart from `file`, `skip`, and `comment`, what other arguments do `read_csv()` and `read_tsv()` have in common?
|
||||
|
||||
|
@ -218,7 +217,7 @@ With the additional `id` parameter we have added a new column called `file` to t
|
|||
This is especially helpful in circumstances where the files you're reading in do not have an identifying column that can help you trace the observations back to their original sources.
|
||||
|
||||
If you have many files you want to read in, it can get cumbersome to write out their names as a list.
|
||||
Instead, you can use the `dir_ls()` function from the fs package to find the files for you by matching a pattern in the file names.
|
||||
Instead, you can use the `dir_ls()` function from the [fs](https://fs.r-lib.org/) package to find the files for you by matching a pattern in the file names.
|
||||
|
||||
```{r}
|
||||
library(fs)
|
||||
|
@ -244,6 +243,7 @@ You can also specify how missing values are written with `na`, and if you want t
|
|||
write_csv(students, "students.csv")
|
||||
```
|
||||
|
||||
Now let's read that csv file back in.
|
||||
Note that the type information is lost when you save to csv:
|
||||
|
||||
```{r, warning = FALSE, message = FALSE}
|
||||
|
|
Loading…
Reference in New Issue