parent
b543ddc9f6
commit
2f85ba936b
|
@ -24,7 +24,7 @@ When working with data you must:
|
|||
1. Figure out what you want to do.
|
||||
|
||||
1. Precisely describe what you want to do in such a way that the
|
||||
compute can understand it (i.e. program it).
|
||||
computer can understand it (i.e. program it).
|
||||
|
||||
1. Execute the program.
|
||||
|
||||
|
@ -67,7 +67,7 @@ It prints differently because it has a different "class" to usual data frames:
|
|||
class(flights)
|
||||
```
|
||||
|
||||
This is called a `tbl_df` (pronounced tibble diff) or a `data_frame` (pronounced "data underscore frame"; cf. `data dot frame`). Generally, however, we want worry about this relatively minor difference and will refer to everything as data frames.
|
||||
This is called a `tbl_df` (pronounced "tibble diff") or a `data_frame` (pronounced "data underscore frame"; cf. `data dot frame`). Generally, however, we won't worry about this relatively minor difference and will refer to everything as data frames.
|
||||
|
||||
You'll learn more about how that works in data structures. If you want to convert your own data frames to this special case, use `as.data_frame()`. I recommend it for large data frames as it makes interactive exploration much less painful.
|
||||
|
||||
|
@ -83,7 +83,7 @@ There are two other important differences between tbl_dfs and data.frames:
|
|||
|
||||
* When you subset a tbl\_df with `[`, it always returns another tbl\_df.
|
||||
Contrast this with a data frame: sometimes `[` returns a data frame and
|
||||
sometimes it just returns a single column:
|
||||
sometimes it just returns a single column (i.e. a vector):
|
||||
|
||||
```{r}
|
||||
df1 <- data.frame(x = 1:3, y = 3:1)
|
||||
|
@ -95,7 +95,7 @@ There are two other important differences between tbl_dfs and data.frames:
|
|||
class(df2[, 1])
|
||||
```
|
||||
|
||||
To extract a single column use `[[` or `$`:
|
||||
To extract a single column from a tbl\_df use `[[` or `$`:
|
||||
|
||||
```{r}
|
||||
class(df2[[1]])
|
||||
|
@ -211,7 +211,7 @@ Multiple arguments to `filter()` are combined with "and". To get more complicate
|
|||
filter(flights, month == 11 | month == 12)
|
||||
```
|
||||
|
||||
Note the order isn't like English. This expression doesn't find on months that equal 11 or 12. Instead it finds all months that equal `11 | 12`, which is `TRUE`. In a numeric context (like here), `TRUE` becomes one, so this finds all flights in January, not November or December.
|
||||
Note the order isn't like English. The following expression doesn't find on months that equal 11 or 12. Instead it finds all months that equal `11 | 12`, which is `TRUE`. In a numeric context (like here), `TRUE` becomes one, so this finds all flights in January, not November or December.
|
||||
|
||||
```{r, eval = FALSE}
|
||||
filter(flights, month == 11 | 12)
|
||||
|
@ -393,7 +393,7 @@ rename(flights, tail_num = tailnum)
|
|||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
This function works similarly to the `select` argument in `base::subset()`. Because the dplyr philosophy is to have small functions that do one thing well, it is its own function in dplyr.
|
||||
The `select()` function works similarly to the `select` argument in `base::subset()`. Because the dplyr philosophy is to have small functions that do one thing well, it is its own function in dplyr.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
@ -566,7 +566,7 @@ by_day <- group_by(flights, year, month, day)
|
|||
summarise(by_day, delay = mean(dep_delay, na.rm = TRUE))
|
||||
```
|
||||
|
||||
Together `group_by()` and `summarise()` provide one of tools that you'll use most commonly when working with dplyr: grouped summaries. But before we go any further with this idea, we need to introduce a powerful new idea: the pipe.
|
||||
Together `group_by()` and `summarise()` provide one of tools that you'll use most commonly when working with dplyr: grouped summaries. But before we go any further with this, we need to introduce a powerful new idea: the pipe.
|
||||
|
||||
### Combining multiple operations with the pipe
|
||||
|
||||
|
@ -774,7 +774,7 @@ Just using means, counts, and sum can get you a long way, but R provides many ot
|
|||
```
|
||||
|
||||
* By position: `first(x)`, `nth(x, 2)`, `last(x)`. These work similarly to
|
||||
`x[1]`, `x[length(x)]`, and `x[n]` but let you set a default value if that
|
||||
`x[1]`, x[n], and `x[length(x)]` but let you set a default value if that
|
||||
position does not exist (i.e. you're trying to get the 3rd element from a
|
||||
group that only has two elements).
|
||||
|
||||
|
|
Loading…
Reference in New Issue