Minor edits
This commit is contained in:
		| @@ -74,7 +74,7 @@ read_csv("a,b,c | ||||
| ``` | ||||
|  | ||||
| In both cases `read_csv()` uses the first line of the data for the column names, which is a very common convention. | ||||
| There are two cases where you might want to tweak this behaviour: | ||||
| There are two cases where you might want to tweak this behavior: | ||||
|  | ||||
| 1.  Sometimes there are a few lines of metadata at the top of the file. | ||||
|     You can use `skip = n` to skip the first `n` lines; or use `comment = "#"` to drop all lines that start with (e.g.) `#`. | ||||
| @@ -118,7 +118,7 @@ To read in more challenging files, you'll need to learn more about how readr par | ||||
| ### First steps | ||||
|  | ||||
| Let's take another look at the `students` data. | ||||
| In the `favourite.food` column, there are a bunch of foot items and then the character string `N/A`, which should have been an real `NA` that R will recognize as "not available". | ||||
| In the `favourite.food` column, there are a bunch of food items and then the character string `N/A`, which should have been an real `NA` that R will recognize as "not available". | ||||
| This is something we can address using the `na` argument. | ||||
|  | ||||
| ```{r message = FALSE} | ||||
| @@ -127,7 +127,7 @@ students <- read_csv("data/students.csv", na = c("N/A", "")) | ||||
| students | ||||
| ``` | ||||
|  | ||||
| Once you read data in, the first step is usually involve transforming it in some way to make it easier to work with in the rest of your analysis. | ||||
| Once you read data in, the first step usually involves transforming it in some way to make it easier to work with in the rest of your analysis. | ||||
| For example, the column names in the `students` file we read in are formatted in non-standard ways. | ||||
| You might consider renaming them one by one with `dplyr::rename()` or you might use the `janitor::clean_names()` function turn them all into snake case at once.[^data-import-1] | ||||
| This function takes in a data frame and returns a data frame with variable names converted to snake case. | ||||
| @@ -140,7 +140,7 @@ students |> | ||||
|   clean_names() | ||||
| ``` | ||||
|  | ||||
| Another common task after reading in data is to consider the variable types. | ||||
| Another common task after reading in data is to consider variable types. | ||||
| For example, `meal_type` is a categorical variable with a known set of possible values. | ||||
| In R, factors can be used to work with categorical variables. | ||||
| We can convert this variable to a factor using the `factor()` function. | ||||
| @@ -162,23 +162,22 @@ We discuss the details of fixing this issue in Chapter \@ref(import-spreadsheets | ||||
| ### Compared to base R | ||||
|  | ||||
| If you've used R before, you might wonder why we're not using `read.csv()`. | ||||
| There are a few good reasons to favour readr functions over the base equivalents: | ||||
| There are a few good reasons to favor readr functions over the base equivalents: | ||||
|  | ||||
| -   They are typically much faster (\~10x) than their base equivalents. | ||||
|     Long running jobs have a progress bar, so you can see what's happening. | ||||
|     If you're looking for raw speed, try `data.table::fread()`. | ||||
|     It doesn't fit quite so well into the tidyverse, but it can be quite a bit faster. | ||||
|  | ||||
| -   They produce tibbles, they don't convert character vectors to factors, use row names, or munge the column names. | ||||
| -   They produce tibbles, and they don't use row names or munge the column names. | ||||
|     These are common sources of frustration with the base R functions. | ||||
|  | ||||
| -   They are more reproducible. | ||||
|     Base R functions inherit some behaviour from your operating system and environment variables, so import code that works on your computer might not work on someone else's. | ||||
|     Base R functions inherit some behavior from your operating system and environment variables, so import code that works on your computer might not work on someone else's. | ||||
|  | ||||
| ### Exercises | ||||
|  | ||||
| 1.  What function would you use to read a file where fields were separated with\ | ||||
|     "\|"? | ||||
| 1.  What function would you use to read a file where fields were separated with "\|"? | ||||
|  | ||||
| 2.  Apart from `file`, `skip`, and `comment`, what other arguments do `read_csv()` and `read_tsv()` have in common? | ||||
|  | ||||
| @@ -218,7 +217,7 @@ With the additional `id` parameter we have added a new column called `file` to t | ||||
| This is especially helpful in circumstances where the files you're reading in do not have an identifying column that can help you trace the observations back to their original sources. | ||||
|  | ||||
| If you have many files you want to read in, it can get cumbersome to write out their names as a list. | ||||
| Instead, you can use the `dir_ls()` function from the fs package to find the files for you by matching a pattern in the file names. | ||||
| Instead, you can use the `dir_ls()` function from the [fs](https://fs.r-lib.org/) package to find the files for you by matching a pattern in the file names. | ||||
|  | ||||
| ```{r} | ||||
| library(fs) | ||||
| @@ -244,6 +243,7 @@ You can also specify how missing values are written with `na`, and if you want t | ||||
| write_csv(students, "students.csv") | ||||
| ``` | ||||
|  | ||||
| Now let's read that csv file back in. | ||||
| Note that the type information is lost when you save to csv: | ||||
|  | ||||
| ```{r, warning = FALSE, message = FALSE} | ||||
|   | ||||
		Reference in New Issue
	
	Block a user