parent
de449d678b
commit
3860c106fb
|
@ -46,7 +46,7 @@ The previous description of the tools of data science is organised roughly accor
|
|||
* Programming tools are not necessarily interesting in their own right,
|
||||
but do allow you to tackle considerably more challenging problems. We'll
|
||||
give you a selection of programming tools in the middle of the book, and
|
||||
then you'll see they can combine with the data science tools to tackle
|
||||
then you'll see how they can combine with the data science tools to tackle
|
||||
interesting modelling problems.
|
||||
|
||||
Within each chapter, we try and stick to a similar pattern: start with some motivating examples so you can see the bigger picture, and then dive into the details. Each section of the book is paired with exercises to help you practice what you've learned. While it's tempting to skip the exercises, there's no better way to learn than practicing on real problems.
|
||||
|
@ -73,7 +73,7 @@ We think R is a great place to start your data science journey because it is an
|
|||
|
||||
### Non-rectangular data
|
||||
|
||||
This book focuses exclusively on rectangular data: collections of values that are each associated with a variable and an observation. There are lots of datasets that do not naturally fit in this paradigm: including images, sounds, trees, and text. But rectangular data frames are extremely common in science and industry, and we believe that they're a great place to start your data science journey.
|
||||
This book focuses exclusively on rectangular data: collections of values that are each associated with a variable and an observation. There are lots of datasets that do not naturally fit in this paradigm: including images, sounds, trees, and text. But rectangular data frames are extremely common in science and industry, and we believe that they are a great place to start your data science journey.
|
||||
|
||||
### Hypothesis confirmation
|
||||
|
||||
|
@ -119,7 +119,7 @@ For now, all you need to know is that you type R code in the console pane, and p
|
|||
|
||||
### The tidyverse
|
||||
|
||||
You'll also need to install some R packages. An R _package_ is a collection of functions, data, and documentation that extends the capabilities of base R. Using packages is key to the successful use of R. The majority of the packages that you will learn in this book are part of the so-called tidyverse. The packages in the tidyverse share a common philosophy of data and R programming, and are designed to work together naturally.
|
||||
You'll also need to install some R packages. An R __package__ is a collection of functions, data, and documentation that extends the capabilities of base R. Using packages is key to the successful use of R. The majority of the packages that you will learn in this book are part of the so-called tidyverse. The packages in the tidyverse share a common philosophy of data and R programming, and are designed to work together naturally.
|
||||
|
||||
You can install the complete tidyverse with a single line of code:
|
||||
|
||||
|
@ -187,7 +187,7 @@ This book is not an island; there is no single resource that will allow you to m
|
|||
|
||||
If you get stuck, start with Google. Typically adding "R" to a query is enough to restrict it to relevant results: if the search isn't useful, it often means that there aren't any R-specific results available. Google is particularly useful for error messages. If you get an error message and you have no idea what it means, try googling it! Chances are that someone else has been confused by it in the past, and there will be help somewhere on the web. (If the error message isn't in English, run `Sys.setenv(LANGUAGE = "en")` and re-run the code; you're more likely to find help for English error messages.)
|
||||
|
||||
If Google doesn't help, try [stackoverflow](http://stackoverflow.com). Start by spending a little time searching for an existing answer, including `[R]` restrict your search to questions and answers that use R. If you don't find anything useful, prepare a minimal reproducible example or __reprex__. A good reprex makes it easier for other people to help you, and often you'll figure out the problem yourself in the course of making it.
|
||||
If Google doesn't help, try [stackoverflow](http://stackoverflow.com). Start by spending a little time searching for an existing answer, including `[R]` to restrict your search to questions and answers that use R. If you don't find anything useful, prepare a minimal reproducible example or __reprex__. A good reprex makes it easier for other people to help you, and often you'll figure out the problem yourself in the course of making it.
|
||||
|
||||
There are three things you need to include to make your example reproducible: required packages, data, and code.
|
||||
|
||||
|
|
|
@ -36,7 +36,7 @@ Let's use our first graph to answer a question: Do cars with big engines use mor
|
|||
|
||||
### The `mpg` data frame
|
||||
|
||||
You can test your answer with the `mpg` __data frame__ found in ggplot2 (aka `ggplot2::mpg`). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). `mpg` contains observations collected by the US Environment Protection Agency on 38 models of cars.
|
||||
You can test your answer with the `mpg` __data frame__ found in ggplot2 (aka `ggplot2::mpg`). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). `mpg` contains observations collected by the US Environment Protection Agency on 38 models of car.
|
||||
|
||||
```{r}
|
||||
mpg
|
||||
|
@ -82,7 +82,7 @@ The rest of this chapter will show you how to complete and extend this template
|
|||
|
||||
### Exercises
|
||||
|
||||
1. Run `ggplot(data = mpg)` what do you see?
|
||||
1. Run `ggplot(data = mpg)`. What do you see?
|
||||
|
||||
1. How many rows are in `mpg`? How many columns?
|
||||
|
||||
|
@ -232,7 +232,7 @@ ggplot(data = mpg)
|
|||
|
||||
If you're still stuck, try the help. You can get help about any R function by running `?function_name` in the console, or selecting the function name and pressing F1 in RStudio. Don't worry if the help doesn't seem that helpful - instead skip down to the examples and look for code that matches what you're trying to do.
|
||||
|
||||
If that doesn't help, carefully read the error message. Sometimes the answer will be buried there! But when you're new to R, the answer might be in the error message but you don't yet know how to understand it. Another great tool is Google: trying googling the error message, as it's likely someone else has had the same problem, and has gotten help online.
|
||||
If that doesn't help, carefully read the error message. Sometimes the answer will be buried there! But when you're new to R, the answer might be in the error message but you don't yet know how to understand it. Another great tool is Google: try googling the error message, as it's likely someone else has had the same problem, and has gotten help online.
|
||||
|
||||
## Facets
|
||||
|
||||
|
@ -479,7 +479,7 @@ The algorithm used to calculate new values for a graph is called a __stat__, sho
|
|||
knitr::include_graphics("images/visualization-stat-bar.png")
|
||||
```
|
||||
|
||||
You can learn which stat a geom uses by inspecting the default value for the `stat` argument. For example, `?geom_bar` shows that the default value for `stat` is "count", which means that `geom_bar()` uses `stat_count()`. `stat_count()` is documented on the same page as `geom_bar()`, and if you scroll down you can find a section called "Computed variables". That tells that it computes two new variables: `count` and `prop`.
|
||||
You can learn which stat a geom uses by inspecting the default value for the `stat` argument. For example, `?geom_bar` shows that the default value for `stat` is "count", which means that `geom_bar()` uses `stat_count()`. `stat_count()` is documented on the same page as `geom_bar()`, and if you scroll down you can find a section called "Computed variables". That describes how it computes two new variables: `count` and `prop`.
|
||||
|
||||
You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`:
|
||||
|
||||
|
@ -573,7 +573,7 @@ ggplot2 provides over 20 stats for you to use. Each stat is a function, so you c
|
|||
|
||||
## Position adjustments
|
||||
|
||||
There's one more piece of magic associated with bar charts. You can colour a bar chart using either the `colour` aesthetic, or more usefully, `fill`:
|
||||
There's one more piece of magic associated with bar charts. You can colour a bar chart using either the `colour` aesthetic, or, more usefully, `fill`:
|
||||
|
||||
```{r out.width = "50%", fig.align = "default"}
|
||||
ggplot(data = diamonds) +
|
||||
|
@ -662,7 +662,7 @@ To learn more about a position adjustment, look up the help page associated with
|
|||
|
||||
## Coordinate systems
|
||||
|
||||
Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is the Cartesian coordinate system where the x and y position act independently to find the location of each point. There are a number of other coordinate systems that are occasionally helpful.
|
||||
Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is the Cartesian coordinate system where the x and y positions act independently to determine the location of each point. There are a number of other coordinate systems that are occasionally helpful.
|
||||
|
||||
* `coord_flip()` switches the x and y axes. This is useful (for example),
|
||||
if you want horizontal boxplots. It's also useful for long labels: it's
|
||||
|
|
|
@ -84,7 +84,7 @@ R has a large collection of built-in functions that are called like this:
|
|||
function_name(arg1 = val1, arg2 = val2, ...)
|
||||
```
|
||||
|
||||
Let's try using `seq()` which makes regular **seq**uences of numbers and, while we're at it, learn more helpful features of RStudio. Type `se` and hit TAB. A popup shows you possible completions. Specify `seq()` by typing more (a "q") to disambiguate, or by using ↑/↓ arrows to select. Notice the floating tooltip that pops up, reminding you of the function's arguments and purpose. If you want more help, press F1 to get all the details in help tab in the lower right pane.
|
||||
Let's try using `seq()` which makes regular **seq**uences of numbers and, while we're at it, learn more helpful features of RStudio. Type `se` and hit TAB. A popup shows you possible completions. Specify `seq()` by typing more (a "q") to disambiguate, or by using ↑/↓ arrows to select. Notice the floating tooltip that pops up, reminding you of the function's arguments and purpose. If you want more help, press F1 to get all the details in the help tab in the lower right pane.
|
||||
|
||||
Press TAB once more when you've selected the function you want. RStudio will add matching opening (`(`) and closing (`)`) parentheses for you. Type the arguments `1, 10` and hit return.
|
||||
|
||||
|
@ -92,7 +92,7 @@ Press TAB once more when you've selected the function you want. RStudio will add
|
|||
seq(1, 10)
|
||||
```
|
||||
|
||||
Type this code and notice similar assistance help with the paired quotation marks:
|
||||
Type this code and notice you get similar assistance with the paired quotation marks:
|
||||
|
||||
```{r}
|
||||
x <- "hello world"
|
||||
|
|
Loading…
Reference in New Issue