Mostly hide msgs to save space (#1356)
This commit is contained in:
parent
0a134cb118
commit
86efe55bc2
|
@ -58,8 +58,7 @@ read_csv("data/students.csv") |>
|
|||
|
||||
We can read this file into R using `read_csv()`.
|
||||
The first argument is the most important: the path to the file.
|
||||
You can think about the path as the address of the file.
|
||||
The following says that the file is called `students.csv` and that it's in the `data` folder.
|
||||
You can think about the path as the address of the file: the file is called `students.csv` and that it lives in the `data` folder.
|
||||
|
||||
```{r}
|
||||
#| message: true
|
||||
|
@ -114,7 +113,7 @@ students |>
|
|||
|
||||
An alternative approach is to use `janitor::clean_names()` to use some heuristics to turn them all into snake case at once[^data-import-1].
|
||||
|
||||
[^data-import-1]: The [janitor](http://sfirke.github.io/janitor/) package is not part of the tidyverse, but it offers handy functions for data cleaning and works well within data pipelines that uses `|>`.
|
||||
[^data-import-1]: The [janitor](http://sfirke.github.io/janitor/) package is not part of the tidyverse, but it offers handy functions for data cleaning and works well within data pipelines that use `|>`.
|
||||
|
||||
```{r}
|
||||
#| message: false
|
||||
|
@ -128,9 +127,7 @@ For example, `meal_plan` is a categorical variable with a known set of possible
|
|||
```{r}
|
||||
students |>
|
||||
janitor::clean_names() |>
|
||||
mutate(
|
||||
meal_plan = factor(meal_plan)
|
||||
)
|
||||
mutate(meal_plan = factor(meal_plan))
|
||||
```
|
||||
|
||||
Note that the values in the `meal_plan` variable have stayed the same, but the type of variable denoted underneath the variable name has changed from character (`<chr>`) to factor (`<fct>`).
|
||||
|
@ -307,12 +304,14 @@ It then works through the following questions:
|
|||
You can see that behavior in action in this simple example:
|
||||
|
||||
```{r}
|
||||
#| message: false
|
||||
|
||||
read_csv("
|
||||
logical,numeric,date,string
|
||||
TRUE,1,2021-01-15,abc
|
||||
false,4.5,2021-02-15,def
|
||||
T,Inf,2021-02-16,ghi"
|
||||
)
|
||||
T,Inf,2021-02-16,ghi
|
||||
")
|
||||
```
|
||||
|
||||
This heuristic works well if you have a clean dataset, but in real life, you'll encounter a selection of weird and beautiful failures.
|
||||
|
@ -331,13 +330,14 @@ simple_csv <- "
|
|||
.
|
||||
20
|
||||
30"
|
||||
|
||||
```
|
||||
|
||||
If we read it without any additional arguments, `x` becomes a character column:
|
||||
|
||||
```{r}
|
||||
df <- read_csv(simple_csv)
|
||||
#| message: false
|
||||
|
||||
read_csv(simple_csv)
|
||||
```
|
||||
|
||||
In this very small case, you can easily see the missing value `.`.
|
||||
|
@ -363,7 +363,9 @@ That suggests this dataset uses `.` for missing values.
|
|||
So then we set `na = "."`, the automatic guessing succeeds, giving us the numeric column that we want:
|
||||
|
||||
```{r}
|
||||
df <- read_csv(simple_csv, na = ".")
|
||||
#| message: false
|
||||
|
||||
read_csv(simple_csv, na = ".")
|
||||
```
|
||||
|
||||
### Column types
|
||||
|
@ -407,6 +409,8 @@ For example, you might have sales data for multiple months, with each month's da
|
|||
With `read_csv()` you can read these data in at once and stack them on top of each other in a single data frame.
|
||||
|
||||
```{r}
|
||||
#| message: false
|
||||
|
||||
sales_files <- c("data/01-sales.csv", "data/02-sales.csv", "data/03-sales.csv")
|
||||
read_csv(sales_files, id = "file")
|
||||
```
|
||||
|
@ -425,7 +429,7 @@ sales_files <- c(
|
|||
read_csv(sales_files, id = "file")
|
||||
```
|
||||
|
||||
With the additional `id` parameter we have added a new column called `file` to the resulting data frame that identifies the file the data come from.
|
||||
The `id` argument adds a new column called `file` to the resulting data frame that identifies the file the data come from.
|
||||
This is especially helpful in circumstances where the files you're reading in do not have an identifying column that can help you trace the observations back to their original sources.
|
||||
|
||||
If you have many files you want to read in, it can get cumbersome to write out their names as a list.
|
||||
|
@ -515,18 +519,6 @@ tibble(
|
|||
)
|
||||
```
|
||||
|
||||
Note that every column in tibble must be same size, so you'll get an error if they're not:
|
||||
|
||||
```{r}
|
||||
#| error: true
|
||||
|
||||
tibble(
|
||||
x = c(1, 2),
|
||||
y = c("h", "m", "g"),
|
||||
z = c(0.08, 0.83, 0.6)
|
||||
)
|
||||
```
|
||||
|
||||
Laying out the data by column can make it hard to see how the rows are related, so an alternative is `tribble()`, short for **tr**ansposed t**ibble**, which lets you lay out your data row by row.
|
||||
`tribble()` is customized for data entry in code: column headings start with `~` and entries are separated by commas.
|
||||
This makes it possible to lay out small amounts of data in an easy to read form:
|
||||
|
|
Loading…
Reference in New Issue