Fix weird `r chunk`s introduced at some point

This commit is contained in:
mine-cetinkaya-rundel 2023-05-26 13:30:22 -04:00
parent 4ce20cd465
commit fa76f8154d
1 changed files with 5 additions and 11 deletions

View File

@ -601,28 +601,25 @@ On subsequent runs, knitr will check to see if the code has changed, and if it h
The caching system must be used with care, because by default it is based on the code only, not its dependencies.
For example, here the `processed_data` chunk depends on the `raw-data` chunk:
```
`r chunk`{r}
```{r}
#| label: raw-data
rawdata <- readr::read_csv("a_very_large_file.csv")
`r chunk`
```
`r chunk`{r}
```{r}
#| label: processed_data
#| cache: true
processed_data <- rawdata |>
filter(!is.na(import_var)) |>
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
```
Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes.
You can avoid that problem with the `dependson` chunk option:
```
`r chunk`{r}
```{r}
#| label: processed-data
#| cache: true
#| dependson: "raw-data"
@ -630,7 +627,6 @@ You can avoid that problem with the `dependson` chunk option:
processed_data <- rawdata |>
filter(!is.na(import_var)) |>
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
```
`dependson` should contain a character vector of *every* chunk that the cached chunk depends on.
@ -642,13 +638,11 @@ This is an arbitrary R expression that will invalidate the cache whenever it cha
A good function to use is `file.info()`: it returns a bunch of information about the file including when it was last modified.
Then you can write:
```
`r chunk`{r}
```{r}
#| label: raw-data
#| cache.extra: file.info("a_very_large_file.csv")
rawdata <- readr::read_csv("a_very_large_file.csv")
`r chunk`
```
We've followed the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) to name these chunks: each chunk is named after the primary object that it creates.