Merge branch 'master' of github.com:hadley/r4ds

This commit is contained in:
hadley 2016-08-25 08:20:37 -05:00
commit 5caf82d215
2 changed files with 35 additions and 33 deletions

View File

@ -225,6 +225,8 @@ This book isn't just the product of Hadley and Garrett, but is the result of man
* The \#rstats twitter community who reviewed all of the draft chapters * The \#rstats twitter community who reviewed all of the draft chapters
and provided tons of useful feedback. and provided tons of useful feedback.
* Tal Galili for augmenting his dendextend package to support a section on clustering that did not make it into the final draft.
This book was written in the open, and many people contributed pull requests to fix minor problems. Special thanks goes to everyone who contributed via GitHub: This book was written in the open, and many people contributed pull requests to fix minor problems. Special thanks goes to everyone who contributed via GitHub:
```{r, results = "asis", echo = FALSE, message = FALSE} ```{r, results = "asis", echo = FALSE, message = FALSE}

View File

@ -1,11 +1,11 @@
# R Markdown workflow # R Markdown workflow
Earlier we discussed a basic workflow for capturing your R code where you work work interactively in the _console_, then capture what works in the _script editor_. R Markdown effectively puts the console and the script editor in same place, blurring the lines between interactive exploration and long-term code capture. You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter. When you're happy, you move on and start a new chunk. Earlier, we discussed a basic workflow for capturing your R code where you work interactively in the _console_, then capture what works in the _script editor_. R Markdown effectively puts the console and the script editor in same place, blurring the lines between interactive exploration and long-term code capture. You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter. When you're happy, you move on and start a new chunk.
R Markdown is also important because it so tightly integrates prose and code. This makes it a great __analysis notebook__ because it provides it lets you develop code and record your thoughts. An analysis notebook shares many of the same goals as classic lab notebook in the physical sciences: R Markdown is also important because it so tightly integrates prose and code. This makes it a great __analysis notebook__ because it lets you develop code and record your thoughts. An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences:
* Record what you what did and why you did it. Regardless of how great your * Record what you did and why you did it. Regardless of how great your
memory is, if you don't record what you do, there will come some time where memory is, if you don't record what you do, there will come a time when
you have forgotten important details. Write them down so you don't forget! you have forgotten important details. Write them down so you don't forget!
* To support rigorous thinking. You are more likely to come up with a strong * To support rigorous thinking. You are more likely to come up with a strong
@ -15,12 +15,12 @@ R Markdown is also important because it so tightly integrates prose and code. Th
* To help others understand your work. It is rare to do data analysis by * To help others understand your work. It is rare to do data analysis by
yourself, and you'll often be working as part of a team. A lab notebook yourself, and you'll often be working as part of a team. A lab notebook
helps you to share not only what you've done, but why you did it helps you share not only what you've done, but why you did it with your
with your colleagues or lab mates. colleagues or lab mates.
And much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks. I've drawn on my on experiences and the Colin Purrington's advice on lab notebooks <http://colinpurrington.com/tips/lab-notebooks> to come up with the following list of tips: Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks. I've drawn on my own experiences and Colin Purrington's advice on lab notebooks <http://colinpurrington.com/tips/lab-notebooks> to come up with the following list of tips:
* Ensure each notebook has an descriptive title, evocative filename, and a * Ensure each notebook has a descriptive title, an evocative filename, and a
first paragraph that briefly describes the aims of the analysis. first paragraph that briefly describes the aims of the analysis.
* Use the YAML header date field to record the date you started working on the * Use the YAML header date field to record the date you started working on the
@ -33,30 +33,30 @@ And much of the good advice about using lab notebooks effectively can also be tr
Use ISO8601 YYYY-MM-DD format so that's there no ambiguity. Use it Use ISO8601 YYYY-MM-DD format so that's there no ambiguity. Use it
even if you don't normally write dates that way! even if you don't normally write dates that way!
* If you spend a lot of time on analysis idea and it turns out to be a dead * If you spend a lot of time on an analysis idea and it turns out to be a
end, don't delete it! Write up a brief note about why it failed and leave dead end, don't delete it! Write up a brief note about why it failed and
it in the notebook. That will help you avoid going down the same dead end leave it in the notebook. That will help you avoid going down the same
when you come back to the analysis in the future. dead end when you come back to the analysis in the future.
* Generally, you're better off doing data entry outside of R. If you need * Generally, you're better off doing data entry outside of R. If you need
a small snippet of data, clearly lay it out using `tibble::tribble()`. a small snippet of data, clearly lay it out using `tibble::tribble()`.
* If you discover a error in a data file, never modify it directly, but * If you discover an error in a data file, never modify it directly, but
instead write code to correct the value. Explain why you made the fix. instead write code to correct the value. Explain why you made the fix.
* Before you finish for the day, make sure you can knitr the notebook * Before you finish for the day, make sure you can compile the notebook
(aftering clearing caches if you're using them). That will let you fix using knitr (aftering clearing caches if you're using them). That will
any problems while the code is still fresh in your mind. let you fix any problems while the code is still fresh in your mind.
* If you want your code to be reproducible in the long-run (i.e. so you can * If you want your code to be reproducible in the long-run (i.e. so you can
come back to run it in a year or so), you'll need to track the versions come back to run it next month or next year), you'll need to track the
of the packages that your code uses. A rigorous approach is to use versions of the packages that your code uses. A rigorous approach is to use
__packrat__, <http://rstudio.github.io/packrat/>, which stores packages in __packrat__, <http://rstudio.github.io/packrat/>, which stores packages in
your project directory. A quick and dirty hack is to include a chunk that your project directory. A quick and dirty hack is to include a chunk that
runs `sessionInfo()` --- that won't let easily recreate your packages as runs `sessionInfo()` --- that won't let easily recreate your packages as
they are today, but at least you know what they were. they are today, but at least you know what they were.
* You are going to create many many many analysis notebooks over the course * You are going to create many, many, many analysis notebooks over the course
of your career. How are you going to organise them so you can find them of your career. How are you going to organise them so you can find them
again in the future? I recommend storing them in individual projects, again in the future? I recommend storing them in individual projects,
and coming up with a good naming scheme. and coming up with a good naming scheme.