The goal of "R for Data Science" is to give you a solid foundation into using R to do data science. The goal is not to be exhaustive, but to instead focus on what we think are the critical skills for data science:
Above, I've listed the components of the data science process in roughly the order you'll encounter them in an analysis (although of course you'll iterate multiple times). This, however, is not the order you'll encounter them in this book. This is because:
We've honed this order based on our experience teaching live classes, and it's been carefully designed to keep you motivated. We try and stick to a similar pattern within each chapter: give some bigger motivating examples so you can see the bigger picture, and then dive into the details.
Each section of the book also comes with exercises to help you practice what you've learned. It's tempting to skip these, but there's no better way to learn than practicing. If you were taking a class with either of us, we'd force you to do them by making them homework. (Sometimes I feel like teaching is the art of tricking people to do what's in their own best interests.)
Throughout the book, we will discuss the principles of data that will help you become a better scientist. That begins here. We will refer to the terms below throughout the book because they are so useful.
* A _variable_ is a quantity, quality, or property that you can measure.
This book focuses exclusively on structured data sets: collections of values that are each associated with a variable and an observation. There are lots of data that doesn't naturally fit in this paradigm: images, sounds, trees, text. But data frames are extremely common in science and in industry and we believe that they're a great place to start your data analysis journey.
To run the code in this book, you will need to install both R and the RStudio IDE, an application that makes R easier to use. Both are free and easy to install.
To install R, visit <http://cran.r-project.org> and click the link that matches your operating system. What you do next will depend on your operating system.
* Mac users should click the `.pkg` file at the top of the page. This file contains the most current release of R. Once the file is downloaded, double click it to open an R installer. Follow the directions in the installer to install R.
* Linux users should select their distribution and then follow the distribution specific instructions to install R. <https://cran.r-project.org/bin/linux/> includes these instructions alongside the files to download.
After you install R, visit <http://www.rstudio.com/download> to download the RStudio IDE. Choose the installer for your system. Then click the link to download the application. Once you have the application, installation is easy. Once RStudio IDE is installed, open it as you would open any other application.
* Cmd + Enter: sends current line from editor to console.
* Tab: suggest possible completions for the text you've typed.
* Cmd + ↑: in the console, searches all commands you've typed that start with
those characters.
* Cmd + Shift + F10: restart.
* Alt + Shift + K: the keyboard shortcut that shows all the keyboard shortcuts.
Note about turning on save/load session off.
### R Packages
An R _package_ is a collection of functions, data sets, and help files that extends the R language. We will a lot of R packages in this book. To install them all, open RStudio and run:
R will download the packages from CRAN and install them in your system library. If you have problems installing, make that you are connected to the internet, and that you haven't blocked <http://cran.r-project.org> in your firewall or proxy.
You will not be able to use the functions, objects, and help files in a package until you load it with `library()`. You will need to reload the package if you start a new R session.