r4ds/data-structures.Rmd

# Data structures

```{r, include = FALSE}
library(purrr)
```

Might be quite brief.

Atomic vectors and lists + data frames.

Most important data types:

* logical
* integer & double
* character
* date
* date time
* factor

<http://adv-r.had.co.nz/OO-essentials.html>

## Vectors

Every vector has three key properties:

1. Type: e.g. integer, double, list.  Retrieve with `typeof()`.
2. Length. Retrieve with `length()`
3. Attributes. A named of list of additional metadata. With the `class`
   attribute used to build more complex data structure (like factors and
   dates) up from simpler components. Get with `attributes()`.

(Need function to show these? `vector_str()`?)

### Predicates

|                  | lgl | int | dbl | chr | list | null |
|------------------|-----|-----|-----|-----|------|------|
| `is_logical()`   |  x  |     |     |     |      |      |
| `is_integer()`   |     |  x  |     |     |      |      |
| `is_double()`    |     |     |  x  |     |      |      |
| `is_numeric()`   |     |  x  |  x  |     |      |      |
| `is_character()` |     |     |     |  x  |      |      |
| `is_atomic()`    |  x  |  x  |  x  |  x  |      |      |
| `is_list()`      |     |     |     |     |  x   |      |
| `is_vector()`    |  x  |  x  |  x  |  x  |  x   |      |
| `is_null()`      |     |     |     |     |      | x    |

Compared to the base R functions, they only inspect the type of the object, not its attributes. This means they tend to be less surprising:

```{r}
is.atomic(NULL)
is_atomic(NULL)

is.vector(factor("a"))
is_vector(factor("a"))
```

I recommend using these instead of the base functions.

Each predicate also comes with "scalar" and "bare" versions. The scalar version checks that the length is 1 and the bare version checks that the object is a bare vector with no S3 class.

```{r}
y <- factor(c("a", "b", "c"))
is_integer(y)
is_scalar_integer(y)
is_bare_integer(y)
```


### Exercises
1.  Carefully read the documentation of `is.vector()`. What does it actually
    test for?

## Atomic vectors

### Numbers

```{r}
sqrt(2) ^ 2 - 2

0/0
1/0
-1/0

mean(numeric())
```

## Elemental vectors

All built on top of atomic vectors.

`class()`

### Factors

(Since won't get a chapter of their own)

### Dates

### Date times

## Recursive vectors (lists)

Lists are the data structure R uses for hierarchical objects. You're already familiar with vectors, R's data structure for 1d objects. Lists extend these ideas to model objects that are like trees. You can create a hierarchical structure with a list because unlike vectors, a list can contain other lists.

You create a list with `list()`:

```{r}
x <- list(1, 2, 3)
str(x)

x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
```

Unlike atomic vectors, `lists()` can contain a mix of objects:

```{r}
y <- list("a", 1L, 1.5, TRUE)
str(y)
```

Lists can even contain other lists!

```{r}
z <- list(list(1, 2), list(3, 4))
str(z)
```

`str()` is very helpful when looking at lists because it focusses on the structure, not the contents.

### Visualising lists

To explain more complicated list manipulation functions, it's helpful to have a visual representation of lists. For example, take these three lists:

```{r}
x1 <- list(c(1, 2), c(3, 4))
x2 <- list(list(1, 2), list(3, 4))
x3 <- list(1, list(2, list(3)))
```

I draw them as follows:

```{r, echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/lists-structure.png")
```

* Lists are rounded rectangles that contain their children.

* I draw each child a little darker than its parent to make it easier to see
  the hierarchy.

* The orientation of the children (i.e. rows or columns) isn't important,
  so I'll pick a row or column orientation to either save space or illustrate
  an important property in the example.

### Subsetting

There are three ways to subset a list, which I'll illustrate with `a`:

```{r}
a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
```

*   `[` extracts a sub-list. The result will always be a list.

    ```{r}
    str(a[1:2])
    str(a[4])
    ```

    Like subsetting vectors, you can use an integer vector to select by
    position, or a character vector to select by name.

*   `[[` extracts a single component from a list. It removes a level of
    hierarchy from the list.

    ```{r}
    str(y[[1]])
    str(y[[4]])
    ```

*   `$` is a shorthand for extracting named elements of a list. It works
    similarly to `[[` except that you don't need to use quotes.

    ```{r}
    a$a
    a[["b"]]
    ```

Or visually:

```{r, echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/lists-subsetting.png")
```

### Lists of condiments

It's easy to get confused between `[` and `[[`, but it's important to understand the difference. A few months ago I stayed at a hotel with a pretty interesting pepper shaker that I hope will help you remember these differences:

```{r, echo = FALSE, out.width = "25%"}
knitr::include_graphics("images/pepper.jpg")
```

If this pepper shaker is your list `x`, then, `x[1]` is a pepper shaker containing a single pepper packet:

```{r, echo = FALSE, out.width = "25%"}
knitr::include_graphics("images/pepper-1.jpg")
```

`x[2]` would look the same, but would contain the second packet. `x[1:2]` would be a pepper shaker containing two pepper packets.

`x[[1]]` is:

```{r, echo = FALSE, out.width = "25%"}
knitr::include_graphics("images/pepper-2.jpg")
```

If you wanted to get the content of the pepper package, you'd need `x[[1]][[1]]`:

```{r, echo = FALSE, out.width = "25%"}
knitr::include_graphics("images/pepper-3.jpg")
```

### Exercises

1.  Draw the following lists as nested sets.

1.  Generate the lists corresponding to these nested set diagrams.

1.  What happens if you subset a data frame as if you're subsetting a list?
    What are the key differences between a list and a data frame?


## Data frames

## Subsetting

Not sure where else this should be covered.