base-R typos and comments (#1123)
This commit is contained in:
parent
4f88cf741f
commit
cf823b61fb
44
base-R.qmd
44
base-R.qmd
|
@ -70,11 +70,9 @@ There are five main types of things that you can subset a vector with, i.e. that
|
|||
x <- c(10, 3, NA, 5, 8, 1, NA)
|
||||
|
||||
# All non-missing values of x
|
||||
!is.na(x)
|
||||
x[!is.na(x)]
|
||||
|
||||
# All even (or missing!) values of x
|
||||
x %% 2 == 0
|
||||
x[x %% 2 == 0]
|
||||
```
|
||||
|
||||
|
@ -96,7 +94,7 @@ There are five main types of things that you can subset a vector with, i.e. that
|
|||
|
||||
### Subsetting data frames
|
||||
|
||||
There are quite a few different ways[^base-r-1] that you can use `[` with a data frame, but the most important way is to selecting rows and columns independently with `df[rows, cols]`. Here `rows` and `cols` are vectors as described above.
|
||||
There are quite a few different ways[^base-r-1] that you can use `[` with a data frame, but the most important way is to select rows and columns independently with `df[rows, cols]`. Here `rows` and `cols` are vectors as described above.
|
||||
For example, `df[rows, ]` and `df[, cols]` select just rows or just columns, using the empty subset to preserve the other dimension.
|
||||
|
||||
[^base-r-1]: Read <https://adv-r.hadley.nz/subsetting.html#subset-multiple> to see how you can also subset a data frame like it is a 1d object and how you can subset it with a matrix.
|
||||
|
@ -125,8 +123,8 @@ We need to use it here because `[` doesn't use tidy evaluation, so you need to b
|
|||
|
||||
There's an important difference between tibbles and data frames when it comes to `[`.
|
||||
In this book we've mostly used tibbles, which *are* data frames, but they tweak some older behaviors to make your life a little easier.
|
||||
In most places, you can use tibbles and data frame interchangeably, so when we want to draw particular attention to R's built-in data frame, we'll write `data.frame`s.
|
||||
So if `df` is a `data.frame`, then `df[, cols]` will return a vector if `col` selects a single column and a data frame if it selects more than one column.
|
||||
In most places, you can use "tibble" and "data frame" interchangeably, so when we want to draw particular attention to R's built-in data frame, we'll write `data.frame`.
|
||||
If `df` is a `data.frame`, then `df[, cols]` will return a vector if `col` selects a single column and a data frame if it selects more than one column.
|
||||
If `df` is a tibble, then `[` will always return a tibble.
|
||||
|
||||
```{r}
|
||||
|
@ -140,7 +138,7 @@ df2[, "x"]
|
|||
One way to avoid this ambiguity with `data.frame`s is to explicitly specify `drop = FALSE`:
|
||||
|
||||
```{r}
|
||||
df1[, "x", drop = FALSE]
|
||||
df1[, "x" , drop = FALSE]
|
||||
```
|
||||
|
||||
### dplyr equivalents
|
||||
|
@ -174,7 +172,7 @@ A number of dplyr verbs are special cases of `[`:
|
|||
df[order(df$x, df$y), ]
|
||||
```
|
||||
|
||||
You can use `order(decreasing = TRUE)` to sort all columns in descending order or `-rank(col)` to individual sort columns in decreasing order.
|
||||
You can use `order(decreasing = TRUE)` to sort all columns in descending order or `-rank(col)` to individually sort columns in decreasing order.
|
||||
|
||||
- Both `select()` and `relocate()` are similar to subsetting the columns with a character vector:
|
||||
|
||||
|
@ -215,11 +213,11 @@ This function was the inspiration for much of dplyr's syntax.
|
|||
## Selecting a single element `$` and `[[` {#sec-subset-one}
|
||||
|
||||
`[`, which selects many elements, is paired with `[[` and `$`, which extract a single element.
|
||||
In this section, we'll show you how to use `[[` and `$` to pull columns out of a data frames, discuss a couple more differences between `data.frames` and tibbles, and emphasize some important differences between `[` and `[[` when used with lists.
|
||||
In this section, we'll show you how to use `[[` and `$` to pull columns out of data frames, discuss a couple more differences between `data.frames` and tibbles, and emphasize some important differences between `[` and `[[` when used with lists.
|
||||
|
||||
### Data frames
|
||||
|
||||
`[[` and `$` can be used extract columns out of a data frame.
|
||||
`[[` and `$` can be used to extract columns out of a data frame.
|
||||
`[[` can access by position or by name, and `$` is specialized for access by name:
|
||||
|
||||
```{r}
|
||||
|
@ -243,11 +241,11 @@ tb$z <- tb$x + tb$y
|
|||
tb
|
||||
```
|
||||
|
||||
There are a number other base approaches to creating new columns including with `transform()`, `with()`, and `within()`.
|
||||
There are a number of other base R approaches to creating new columns including with `transform()`, `with()`, and `within()`.
|
||||
Hadley collected a few examples at <https://gist.github.com/hadley/1986a273e384fb2d4d752c18ed71bedf>.
|
||||
|
||||
Using `$` directly is convenient when performing quick summaries.
|
||||
For example, if you just want find the size of the biggest diamond or the possible values of `cut`, there's no need to use `summarize()`:
|
||||
For example, if you just want to find the size of the biggest diamond or the possible values of `cut`, there's no need to use `summarize()`:
|
||||
|
||||
```{r}
|
||||
max(diamonds$carat)
|
||||
|
@ -289,7 +287,7 @@ For this reason we sometimes joke that tibbles are lazy and surly: they do less
|
|||
|
||||
### Lists
|
||||
|
||||
`[[` and `$` are also really important for working with lists, and it's important to understand how they differ to `[`.
|
||||
`[[` and `$` are also really important for working with lists, and it's important to understand how they differ from `[`.
|
||||
Lets illustrate the differences with a list named `l`:
|
||||
|
||||
```{r}
|
||||
|
@ -306,6 +304,7 @@ l <- list(
|
|||
|
||||
```{r}
|
||||
str(l[1:2])
|
||||
str(l[1])
|
||||
str(l[4])
|
||||
```
|
||||
|
||||
|
@ -390,7 +389,7 @@ df[["x"]]
|
|||
|
||||
In @sec-iteration, you learned tidyverse techniques for iteration like `dplyr::across()` and the map family of functions.
|
||||
In this section, you'll learn about their base equivalents, the **apply family**.
|
||||
In this context apply and maps are synonyms because another way of saying "map a function over each element of a vector" is "apply a function over each element of a vector".
|
||||
In this context apply and map are synonyms because another way of saying "map a function over each element of a vector" is "apply a function over each element of a vector".
|
||||
Here we'll give you a quick overview of this family so you can recognize them in the wild.
|
||||
|
||||
The most important member of this family is `lapply()`, which is very similar to `purrr::map()`[^base-r-3].
|
||||
|
@ -442,13 +441,13 @@ Unfortunately `tapply()` returns its results in a named vector which requires so
|
|||
If you want to see how you might use `tapply()` or other base techniques to perform other grouped summaries, Hadley has collected a few techniques [in a gist](https://gist.github.com/hadley/c430501804349d382ce90754936ab8ec).
|
||||
|
||||
The final member of the apply family is the titular `apply()`, which works with matrices and arrays.
|
||||
In particular, watch out of `apply(df, 2, something)` which is a slow and potentially dangerous way of doing `lapply(df, something)`.
|
||||
In particular, watch out for `apply(df, 2, something)`, which is a slow and potentially dangerous way of doing `lapply(df, something)`.
|
||||
This rarely comes up in data science because we usually work with data frames and not matrices.
|
||||
|
||||
## For loops
|
||||
|
||||
For loops are the fundamental building block of iteration that both the apply and map families use under the hood.
|
||||
For loops are powerful and general tool that are important to learn as you become a more experienced R programmer.
|
||||
For loops are powerful and general tools that are important to learn as you become a more experienced R programmer.
|
||||
The basic structure of a for loop looks like this:
|
||||
|
||||
```{r}
|
||||
|
@ -458,7 +457,7 @@ for (element in vector) {
|
|||
}
|
||||
```
|
||||
|
||||
The most straightforward use of `for()` loops is achieve the same affect as `walk()`: call some function with a side-effect on each element of a list.
|
||||
The most straightforward use of `for()` loops is to achieve the same affect as `walk()`: call some function with a side-effect on each element of a list.
|
||||
For example, in @sec-save-database instead of using walk:
|
||||
|
||||
```{r}
|
||||
|
@ -519,12 +518,12 @@ for (path in paths) {
|
|||
```
|
||||
|
||||
We recommend avoiding this pattern because it can become very slow when the vector is very long.
|
||||
This the source of the persistent canard that `for` loops are slow: they're not, but iteratively growing a vector is.
|
||||
This is the source of the persistent canard that `for` loops are slow: they're not, but iteratively growing a vector is.
|
||||
|
||||
## Plots
|
||||
|
||||
Many R users who don't otherwise use the tidyverse prefer ggplot2 for plotting due to helpful features like sensible defaults, automatic legends, modern look.
|
||||
However, base R plotting functions can still be useful because they're so concise --- it's very little typing to do a basic exploratory plot.
|
||||
Many R users who don't otherwise use the tidyverse prefer ggplot2 for plotting due to helpful features like sensible defaults, automatic legends, and a modern look.
|
||||
However, base R plotting functions can still be useful because they're so concise --- it takes very little typing to do a basic exploratory plot.
|
||||
|
||||
There are two main types of base plot you'll see in the wild: scatterplots and histograms, produced with `plot()` and `hist()` respectively.
|
||||
Here's a quick example from the diamonds dataset:
|
||||
|
@ -540,11 +539,10 @@ Note that base plotting functions work with vectors, so you need to pull columns
|
|||
|
||||
## Summary
|
||||
|
||||
In this chapter, we've shown you selection of base R functions useful for subsetting and iteration.
|
||||
Compared to approaches discussed elsewhere in the book, these functions tend have more of a "vector" flavor than a "data frame" flavor because base R functions tend to take individual vectors, rather than a data frame and some column specification.
|
||||
In this chapter, we've shown you a selection of base R functions useful for subsetting and iteration.
|
||||
Compared to approaches discussed elsewhere in the book, these functions tend to have more of a "vector" flavor than a "data frame" flavor because base R functions tend to take individual vectors, rather than a data frame and some column specification.
|
||||
This often makes life easier for programming and so becomes more important as you write more functions and begin to write your own packages.
|
||||
|
||||
This chapter concludes the programming section of the book.
|
||||
You've made a solid start on your journey to becoming not just a data scientist who uses R, but a data scientist who can *program* in R.
|
||||
We hope these chapters have sparked your interested in programming and that you're are looking forward to learning more outside of this book.
|
||||
|
||||
We hope these chapters have sparked your interested in programming and that you're looking forward to learning more outside of this book.
|
||||
|
|
Loading…
Reference in New Issue