Copyedits for communicate-plots.Rmd & factors.Rmd (#282)
* Copyedits for communicate-plots.Rmd & factors.Rmd * Add missing 't'
This commit is contained in:
parent
7784ced193
commit
90079aea74
|
@ -2,15 +2,15 @@
|
|||
|
||||
## Introduction
|
||||
|
||||
In [exploratory data analysis], you learned how to use plots as tools for _exploration_. When making plots for exploration, you know---even before you look at them---which variables the plot would display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses you'll produce tens of hundreds of plots, most of which are immediately thrown in the trash.
|
||||
In [exploratory data analysis], you learned how to use plots as tools for _exploration_. When making plots for exploration, you knew---even before looking at them---which variables the plot would display. You made each plot for a purpose, and could quickly look at it and move on to the next plot. In the course of most analyses, you'll produce tens of hundreds of plots, most of which are immediately discarded.
|
||||
|
||||
Now you need to _communicate_ the result of your analysis to others. Your audience will not share your background knowledge and will not be deeply invested in the data. To help these newcomers quickly build up a good mental model of the data you will need to invest considerable effort to make your plots as self-explanatory as possible. In this chapter, you'll learn some of the tools that ggplot2 provides to do so.
|
||||
Now you need to _communicate_ the results of your analysis to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. In this chapter, you'll learn some of the tools that ggplot2 provides to do so.
|
||||
|
||||
The focus of this chapter is on the tools that you need to create good graphics. I assume you know what you want, and you just want to know how to do it. For that reason, I highly recommend pairing this advice with a good general visualisation book. I particularly like [_The Truthful Art_](https://amzn.com/0321934075), by Albert Cairo. I doesn't teach the mechanics of creating visualisations, but instead focusses on what you need to think about in order to create effective graphics.
|
||||
This chapter focuses on the tools you need to create good graphics. I assume you have an idea of what you want, and just need to know how to do it. For that reason, I highly recommend pairing this advice with a good general visualisation book. I particularly like [_The Truthful Art_](https://amzn.com/0321934075), by Albert Cairo. It doesn't teach the mechanics of creating visualisations, but instead focuses on what you need to think about in order to create effective graphics.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
In this chapter, we'll focus once again on ggplot2. We'll also use a little dplyr for data manipulation, and a few ggplot2 extension packages, including __ggrepel__ and __viridis__. Rather than loading those extensions here we'll refer to their functions explicitly with the `::` notation. That will help make it obvious what functions are built into ggplot2, and what functions come from other packages.
|
||||
In this chapter, we'll focus once again on ggplot2. We'll also use a little dplyr for data manipulation, and a few ggplot2 extension packages, including __ggrepel__ and __viridis__. Rather than loading those extensions here, we'll refer to their functions explicitly, using the `::` notation. This will help make it clear which functions are built into ggplot2, and which come from other packages.
|
||||
|
||||
```{r, message = FALSE}
|
||||
library(ggplot2)
|
||||
|
@ -76,12 +76,12 @@ ggplot(df, aes(x, y)) +
|
|||
|
||||
### Exercises
|
||||
|
||||
1. Create one plot of the fuel economy data with customized the `title`,
|
||||
1. Create one plot of the fuel economy data with customized `title`,
|
||||
`subtitle`, `caption`, `x`, `y`, and `colour` labels.
|
||||
|
||||
1. The `geom_smooth()` is somewhat misleading because it the `hwy` for
|
||||
large engines is skewed upwards because of the lightweight sports
|
||||
cars with big engines. Use your modelling tools to fit and display
|
||||
1. The `geom_smooth()` is somewhat misleading because the `hwy` for
|
||||
large engines is skewed upwards due to the inclusion of lightweight
|
||||
sports cars with big engines. Use your modelling tools to fit and display
|
||||
a better model.
|
||||
|
||||
1. Take an exploratory graphic that you've created in the last month, and add
|
||||
|
@ -89,9 +89,9 @@ ggplot(df, aes(x, y)) +
|
|||
|
||||
## Annotations
|
||||
|
||||
As well as labelling major components of your plot, it's often useful to label individual observations or groups of observations. The first tool you have at your disposal is `geom_text()`. `geom_text()` is similar to `geom_point()`, but it has an additional aesthetic: `label`. This makes it possible to add textual labels to your plots.
|
||||
In addition to labelling major components of your plot, it's often useful to label individual observations or groups of observations. The first tool you have at your disposal is `geom_text()`. `geom_text()` is similar to `geom_point()`, but it has an additional aesthetic: `label`. This makes it possible to add textual labels to your plots.
|
||||
|
||||
There are two possible sources of labels. First, you might have a tibble that provides label. The plot below isn't terribly useful, but it illustrates a useful approach: pull out the most efficient car in each class with dplyr, and then label it on the plot:
|
||||
There are two possible sources of labels. First, you might have a tibble that provides labels. The plot below isn't terribly useful, but it illustrates a useful approach: pull out the most efficient car in each class with dplyr, and then label it on the plot:
|
||||
|
||||
```{r}
|
||||
best_in_class <- mpg %>%
|
||||
|
@ -120,9 +120,9 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)
|
||||
```
|
||||
|
||||
Note the other handy technique here: I added a second layer of large, hollow points to highligh the points that I've labelled.
|
||||
Note another handy technique used here: I added a second layer of large, hollow points to highlight the points that I've labelled.
|
||||
|
||||
You can sometimes use the same idea to replace the legend with labels placed directly on the plot. It's not wonderful for this plot, but it isn't too bad. (`theme(legend.position = "none")` turns the legend off --- we'll talk about it more shortly).
|
||||
You can sometimes use the same idea to replace the legend with labels placed directly on the plot. It's not wonderful for this plot, but it isn't too bad. (`theme(legend.position = "none"`) turns the legend off --- we'll talk about it more shortly).
|
||||
|
||||
```{r}
|
||||
class_avg <- mpg %>%
|
||||
|
@ -143,7 +143,7 @@ ggplot(mpg, aes(displ, hwy, colour = class)) +
|
|||
theme(legend.position = "none")
|
||||
```
|
||||
|
||||
Alternatively, you might just want to add a single label to the plot, but you'll still need to create a data frame. Often you want to the label in the corner of the plot, so it's convenient to create a new data frame using `summarise()`.
|
||||
Alternatively, you might just want to add a single label to the plot, but you'll still need to create a data frame. Often, you want the label in the corner of the plot, so it's convenient to create a new data frame using `summarise()` to compute the maximum values of x and y.
|
||||
|
||||
```{r}
|
||||
label <- mpg %>%
|
||||
|
@ -159,7 +159,7 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
|
||||
```
|
||||
|
||||
If you want to place the text exactly on the borders of the plot, you can use `+Inf` and `-Inf`. Since I'm no longer computing the positions from `mpg`, I use `tibble()` to create the data frame:
|
||||
If you want to place the text exactly on the borders of the plot, you can use `+Inf` and `-Inf`. Since we're no longer computing the positions from `mpg`, we can use `tibble()` to create the data frame:
|
||||
|
||||
```{r}
|
||||
label <- tibble(
|
||||
|
@ -173,7 +173,7 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
|
||||
```
|
||||
|
||||
I manually broke the label up into lines using `"\n"`. Another approach is to use `stringr::str_wrap()` to automatically add linebreaks, given the number of characters you want per line:
|
||||
In these examples, I manually broke the label up into lines using `"\n"`. Another approach is to use `stringr::str_wrap()` to automatically add linebreaks, given the number of characters you want per line:
|
||||
|
||||
```{r}
|
||||
"Increasing engine size is related to decreasing fuel economy." %>%
|
||||
|
@ -181,7 +181,7 @@ I manually broke the label up into lines using `"\n"`. Another approach is to us
|
|||
writeLines()
|
||||
```
|
||||
|
||||
Also note the use of `hjust` and `vjust` to control the the alignment of the label. Figure \@ref(fig:just) shows all nine possible combinations.
|
||||
Also, note the use of `hjust` and `vjust` to control the alignment of the label. Figure \@ref(fig:just) shows all nine possible combinations.
|
||||
|
||||
```{r just, echo = FALSE, fig.cap = "All nine combinations of `hjust` and `vjust`."}
|
||||
vjust <- c(bottom = 0, center = 0.5, top = 1)
|
||||
|
@ -200,12 +200,12 @@ ggplot(df, aes(x, y)) +
|
|||
geom_text(aes(label = label, hjust = hj, vjust = vj), size = 4)
|
||||
```
|
||||
|
||||
Remember, as well as `geom_text()` you have all the other geoms in ggplot2 available to help annotate your plot. A few ideas:
|
||||
Remember, in addition to `geom_text()`, you have many other geoms in ggplot2 available to help annotate your plot. A few ideas:
|
||||
|
||||
* Use `geom_hline()` and `geom_vline()` to add reference lines. I often make
|
||||
them thick (`size = 2`) and white (`colour = white`) and draw them
|
||||
underneath the primary data layer. That makes them easy to see, but they
|
||||
don't draw attention away from the data.
|
||||
them thick (`size = 2`) and white (`colour = white`), and draw them
|
||||
underneath the primary data layer. That makes them easy to see, without
|
||||
drawing attention away from the data.
|
||||
|
||||
* Use `geom_rect()` to draw a rectangle around points of interest. The
|
||||
boundaries of the rectangle are defined by aesthetics `xmin`, `xmax`,
|
||||
|
@ -215,7 +215,7 @@ Remember, as well as `geom_text()` you have all the other geoms in ggplot2 avail
|
|||
to a point with an arrow. Use aesthetics `x` and `y` to define the
|
||||
starting location, and `xend` and `yend` to define the end location.
|
||||
|
||||
The only limitation is your imagination! (And your patience at position annotations in a way that looks good.)
|
||||
The only limit is your imagination (and your patience at positioning annotations to be aesthetically pleasing)!
|
||||
|
||||
### Exercises
|
||||
|
||||
|
@ -225,7 +225,7 @@ The only limitation is your imagination! (And your patience at position annotati
|
|||
1. Read the documentation for `annotate()`. How can you use it to add a text
|
||||
label to a plot without having to create a tibble?
|
||||
|
||||
1. How do labels with `geom_text()` interract with faceting? How can you
|
||||
1. How do labels with `geom_text()` interact with faceting? How can you
|
||||
add a label to a single facet? How can you put a different label in
|
||||
each facet? (Hint: think about the underlying data.)
|
||||
|
||||
|
@ -233,7 +233,7 @@ The only limitation is your imagination! (And your patience at position annotati
|
|||
box?
|
||||
|
||||
1. What are the four argument to `arrow()`? How do they work? Create a series
|
||||
of plot that demonstrate the most important options.
|
||||
of plots that demonstrate the most important options.
|
||||
|
||||
## Scales
|
||||
|
||||
|
@ -254,17 +254,17 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
scale_colour_discrete()
|
||||
```
|
||||
|
||||
Note the naming scheme for scales: `scale_` followed by the name of the aesthetic, then `_`, then the name of the scale. The default scales are named according to the type of variable they with: continuous, discrete, datetime, or date. There are lots of non-default scales which you'll learn about below.
|
||||
Note the naming scheme for scales: `scale_` followed by the name of the aesthetic, then `_`, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date. There are lots of non-default scales which you'll learn about below.
|
||||
|
||||
The default scales have been carefully chosen to do a good job for a wide range of inputs. But you might want to override the defaults for two reasons:
|
||||
The default scales have been carefully chosen to do a good job for a wide range of inputs. Nevertheless, you might want to override the defaults for two reasons:
|
||||
|
||||
* You might want to tweak some of the parameters of the default scale.
|
||||
This allows you to do things like change the breaks on the axes, or the
|
||||
key labels on the legend.
|
||||
|
||||
* You might want to replace the scale altogether, and use a completely
|
||||
different algorithm. Often you can beat the default because you know
|
||||
more about the data.
|
||||
different algorithm. Often you can do better than the default because
|
||||
you know more about the data.
|
||||
|
||||
### Axis ticks and legend keys
|
||||
|
||||
|
@ -285,7 +285,7 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
scale_y_continuous(labels = NULL)
|
||||
```
|
||||
|
||||
You can also use `breaks` and `labels` control the apperance of legends. Collecting axes and legends are called guides. Axes are used for x and y aesthetics; legends are used used for everything else.
|
||||
You can also use `breaks` and `labels` to control the appearance of legends. Collecting axes and legends are called guides. Axes are used for x and y aesthetics; legends are used used for everything else.
|
||||
|
||||
Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur. For example, take this plot that shows when each US president started and ended their term.
|
||||
|
||||
|
@ -307,7 +307,7 @@ Note that the specification of breaks and labels for date and datetime scales is
|
|||
|
||||
### Legend layout
|
||||
|
||||
You most often use `breaks` and `labels` to tweak the axes. While they both also work for legends, there are a few other techniques your more likely to use.
|
||||
You will most often use `breaks` and `labels` to tweak the axes. While they both also work for legends, there are a few other techniques you are more likely to use.
|
||||
|
||||
To control the overall position of the legend, you need to use a `theme()` setting. We'll come back to themes at the end of the chapter, but in brief, they control the non-data parts of the plot. The themes setting `legend.position` controls where the legend is drawn:
|
||||
|
||||
|
@ -323,7 +323,7 @@ base + theme(legend.position = "right") # the default
|
|||
|
||||
You can also use `legend.postion = "none"` to suppress the display of the legend altogether.
|
||||
|
||||
To control the display of individual legneds, use `guides()` along with `guide_legend()` or `guide_colourbar()`. The following example shows two important settings: controlling the number of rows with `nrow`, and overriding one of the aesthetics to make the points bigger. This is particularly useful if you have used a low `alpha` to display many points on a plot.
|
||||
To control the display of individual legends, use `guides()` along with `guide_legend()` or `guide_colourbar()`. The following example shows two important settings: controlling the number of rows the legend uses with `nrow`, and overriding one of the aesthetics to make the points bigger. This is particularly useful if you have used a low `alpha` to display many points on a plot.
|
||||
|
||||
```{r}
|
||||
ggplot(mpg, aes(displ, hwy)) +
|
||||
|
@ -335,9 +335,9 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
|
||||
### Replacing a scale
|
||||
|
||||
Instead of just tweaking the detail a little, you can also replace the scale altogether. We'll focus on colour scales because there are many options, and they're the scales you're mostly likely to want to change. The same principles apply to the other aesthetics. All colour scales have two variants: `scale_colour_x()` and `scale_fill_x()` for the `colour` and `fill` aesthetics respectively (And the colour scales are available in both UK and US spellings.)
|
||||
Instead of just tweaking the detail a little, you can also replace the scale altogether. We'll focus on colour scales because there are many options, and they're the scales you're mostly likely to want to change. The same principles apply to the other aesthetics. All colour scales have two variants: `scale_colour_x()` and `scale_fill_x()` for the `colour` and `fill` aesthetics respectively (the colour scales are available in both UK and US spellings).
|
||||
|
||||
The default categorical scale picks colours that are evenly spaced around the colour wheel. Useful alternatives are the ColourBrewer scales which have been hand tuned to work better for people with common types of colour blindness. The two plots below don't look that different, but there's enough difference in the shades of red and green that they can be distinguished even by people with red-green colour blindness.
|
||||
The default categorical scale picks colours that are evenly spaced around the colour wheel. Useful alternatives are the ColourBrewer scales which have been hand tuned to work better for people with common types of colour blindness. The two plots below look similar, but there is enough difference in the shades of red and green that the dots on the right can be distinguished even by people with red-green colour blindness.
|
||||
|
||||
```{r, fig.align = "default", out.width = "50%"}
|
||||
ggplot(mpg, aes(displ, hwy)) +
|
||||
|
@ -348,7 +348,7 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
scale_colour_brewer(palette = "Set1")
|
||||
```
|
||||
|
||||
Don't forget simpler techniques. If there are just a few colours, you can add a redundant shape mapping. This will also ensure your plot works well in black and white.
|
||||
Don't forget simpler techniques. If there are just a few colours, you can add a redundant shape mapping. This will also help ensure your plot is interpretable in black and white.
|
||||
|
||||
```{r}
|
||||
ggplot(mpg, aes(displ, hwy)) +
|
||||
|
@ -356,14 +356,14 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
scale_colour_brewer(palette = "Set1")
|
||||
```
|
||||
|
||||
Figure \@ref(fig:brewer) shows the complete list of all palettes. The sequential (top) and diverging (bottom) palettes are particularly useful if your categorical values are ordered, or have a "middle". This often arises if you've used `cut()` to make a continuous varible into a categorical variable.
|
||||
Figure \@ref(fig:brewer) shows the complete list of all palettes. The sequential (top) and diverging (bottom) palettes are particularly useful if your categorical values are ordered, or have a "middle". This often arises if you've used `cut()` to make a continuous variable into a categorical variable.
|
||||
|
||||
```{r brewer, fig.asp = 2.5, echo = FALSE, fig.cap = "All ColourBrewer scales."}
|
||||
par(mar = c(0, 3, 0, 0))
|
||||
RColorBrewer::display.brewer.all()
|
||||
```
|
||||
|
||||
When you have a predefined mapping between values and colours use `scale_colour_manual()`. For example, if we map presidential party to colour, we want to use the standard mapping of red for Republicans and blue for Democrats:
|
||||
When you have a predefined mapping between values and colours, use `scale_colour_manual()`. For example, if we map presidential party to colour, we want to use the standard mapping of red for Republicans and blue for Democrats:
|
||||
|
||||
```{r}
|
||||
presidential %>%
|
||||
|
@ -376,7 +376,7 @@ presidential %>%
|
|||
|
||||
For continuous colour, you can use the built-in `scale_colour_gradient()` or `scale_fill_gradient()`. If you have a diverging scale, you can use `scale_colour_gradient2()`. That allows you to give, for example, positive and negative values different colours. That's sometimes also useful if you want to distinguish points above or below the mean.
|
||||
|
||||
Another option is `scale_colour_viridis()` provided by the __viridis__ package. It's a continuous analog of the categorical Brewer scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored a continuous colour scheme that has good perceptual properities. Here's an example from the viridis vignette.
|
||||
Another option is `scale_colour_viridis()` provided by the __viridis__ package. It's a continuous analog of the categorical Brewer scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored a continuous colour scheme that has good perceptual properties. Here's an example from the viridis vignette.
|
||||
|
||||
```{r, fig.align = "default", fig.asp = 1, out.width = "50%", fig.width = 4}
|
||||
df <- tibble(
|
||||
|
@ -404,12 +404,12 @@ ggplot(df, aes(x, y)) +
|
|||
coord_fixed()
|
||||
```
|
||||
|
||||
1. What is first argument to every scale? How does it compare to to `labs()`?
|
||||
1. What is the first argument to every scale? How does it compare to `labs()`?
|
||||
|
||||
1. Change the display of the presidential terms by:
|
||||
|
||||
1. Combining the two variants shown above.
|
||||
1. Improve the display of the y axis.
|
||||
1. Improving the display of the y axis.
|
||||
1. Labelling each term with the name of the president.
|
||||
1. Adding informative plot labels.
|
||||
1. Placing breaks every 4 years (this is trickier than it seems!).
|
||||
|
@ -425,9 +425,9 @@ ggplot(df, aes(x, y)) +
|
|||
|
||||
There are three ways to control the plot limits:
|
||||
|
||||
1. By controlling the data.
|
||||
1. Setting the limits in each scale.
|
||||
1. Setting `xlim` and `ylim` in `coord_cartesian()`.
|
||||
1. Adjusting what data are plotted
|
||||
1. Setting the limits in each scale
|
||||
1. Setting `xlim` and `ylim` in `coord_cartesian()`
|
||||
|
||||
To zoom in on a region of the plot, it's generally best to use `coord_cartesian()`. Compare the following two plots:
|
||||
|
||||
|
@ -445,7 +445,7 @@ mpg %>%
|
|||
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
|
||||
```
|
||||
|
||||
You can also set the `limits` on individual scales. If you are reducing the limits, this is basically equivalent to subsetting the data. It's more useful if you want _expand_ the limits, for example for matching scales across different plots. Take the following toy example: if we extract out two classes of cars and plot them separately, it's hard to compare the plots because all three scales have different ranges.
|
||||
You can also set the `limits` on individual scales. Reducing the limits is basically equivalent to subsetting the data. It is generally more useful if you want _expand_ the limits, for example, to match scales across different plots. For example, if we extract two classes of cars and plot them separately, it's difficult to compare the plots because all three scales (the x-axis, the y-axis, and the colour aesthetic) have different ranges.
|
||||
|
||||
```{r out.width = "50%", fig.align = "default", fig.width = 4}
|
||||
suv <- mpg %>% filter(class == "suv")
|
||||
|
@ -478,7 +478,7 @@ ggplot(compact, aes(displ, hwy, colour = drv)) +
|
|||
col_scale
|
||||
```
|
||||
|
||||
In this case you could have used faceting, but this technique is broadly useful if you want to make your plots are comparable even when spread across multiple pages of your final report.
|
||||
In this particular case, you could have simply used faceting, but this technique is useful more generally, if for instance, you want to make your plots comparable even when spread across multiple pages of a report.
|
||||
|
||||
## Themes
|
||||
|
||||
|
@ -499,7 +499,7 @@ knitr::include_graphics("images/visualization-themes.png")
|
|||
|
||||
Many people wonder why the default theme has a grey background. This was a deliberate choice because it puts the data forward while still making the grid lines visible. The white grid lines are visible (which is important because they significantly aid position judgements), but they have little visual impact and we can easily tune them out. The grey background gives the plot a similar typographic colour to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background. Finally, the grey background creates a continuous field of colour which ensures that the plot is perceived as a single visual entity.
|
||||
|
||||
It's also possible to control individual components of each theme, like the size and colour of the font used for the y axis. This unfortunately is outside the scope of this book, so you'll need to read the ggplot2 book for the full details. You can also create your own themes if you have a corporate style or you're trying to match a journal style.
|
||||
It's also possible to control individual components of each theme, like the size and colour of the font used for the y axis. Unfortunately, this level of detail is outside the scope of this book, so you'll need to read the [ggplot2 book](https://amzn.com/331924275X) for the full details. You can also create your own themes, if you are trying to match a particular corporate or journal style.
|
||||
|
||||
## Saving your plots
|
||||
|
||||
|
@ -513,18 +513,18 @@ ggsave("my-plot.pdf")
|
|||
file.remove("my-plot.pdf")
|
||||
```
|
||||
|
||||
If you don't specify the `width` and `height` they will be taken from dimensions of the current plotting device. For reproducible code, you'll want to specify them.
|
||||
If you don't specify the `width` and `height` they will be taken from the dimensions of the current plotting device. For reproducible code, you'll want to specify them.
|
||||
|
||||
Generally, however, I think you should be assembling your final reports using knitr and rmarkdown, so I want to focus on the important chunk options that you should know about for graphics. You can learn more about `ggsave()` in the documentation.
|
||||
Generally, however, I think you should be assembling your final reports using knitr and rmarkdown, so I want to focus on the important code chunk options that you should know about for graphics. You can learn more about `ggsave()` in the documentation.
|
||||
|
||||
### Figure sizing
|
||||
|
||||
The biggest challenge of graphics in RMarkdown is getting your figures the right size and shape. There are five main options that control figure sizing: `fig.width`, `fig.height`, `fig.asp`, `out.width` and `out.height`. Image sizing is challenging because there are two sizes (the size of the figure created by R and the size in which it is inserted in the output document), and multiple ways of specifying the size (height, width, aspect ratio: pick two out of three).
|
||||
The biggest challenge of graphics in RMarkdown is getting your figures the right size and shape. There are five main options that control figure sizing: `fig.width`, `fig.height`, `fig.asp`, `out.width` and `out.height`. Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e., height, width, and aspect ratio: pick two of three).
|
||||
|
||||
I only ever use three of the five options:
|
||||
|
||||
* I find it most aesthetically pleasing for plots to have a consistent
|
||||
width. To enforce this I set `fig.width = 6` (6") and `fig.asp = 0.618`
|
||||
width. To enforce this, I set `fig.width = 6` (6") and `fig.asp = 0.618`
|
||||
(the golden ratio) in the defaults. Then in individual chunks, I only
|
||||
adjust `fig.asp`.
|
||||
|
||||
|
@ -558,17 +558,16 @@ If you want to make sure the font size is the same in all your figures, whenever
|
|||
|
||||
### Other important options
|
||||
|
||||
When mingling code and text, like I do in this book, I recommend setting `fig.show = "hold"` so that that plots are shown after the code. This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.
|
||||
When mingling code and text, like I do in this book, I recommend setting `fig.show = "hold"` so that plots are shown after the code. This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.
|
||||
|
||||
To add a caption to the plot, use `fig.cap`. In RMarkdown this will change the figure from inline to "floating".
|
||||
|
||||
If you're producing pdf output, the default graphics type is PDF. This a good default because PDFs are high quality vector graphics. However, they can produce very large and slow plots if you are displaying thousands of points. In that case, set `dev = "png"` to force the use of PNGs. They are slightly lower quality, but will be much more compact.
|
||||
If you're producing PDF output, the default graphics type is PDF. This is a good default because PDFs are high quality vector graphics. However, they can produce very large and slow plots if you are displaying thousands of points. In that case, set `dev = "png"` to force the use of PNGs. They are slightly lower quality, but will be much more compact.
|
||||
|
||||
It's a good idea to give figure producing chunks names, even if you don't routinely label other chunks. The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet).
|
||||
It's a good idea to name code chunks that produce figures, even if you don't routinely label other chunks. The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet).
|
||||
|
||||
## Learning more
|
||||
|
||||
The absolute best place to learn more is the ggplot2 book: [_ggplot2: Elegant graphics for data analysis_](https://amzn.com/331924275X). It goes into much more depth about the underlying theory, and has many more examples of how to combine the individual pieces to solve practical problems. Unfortunately the book is not available online for free, although can find the source code at <https://github.com/hadley/ggplot2-book>.
|
||||
|
||||
Another great resource is the ggplot2 extensions guide at <http://www.ggplot2-exts.org/>. This lists many of the packages that extend ggplot2 with new geoms and scales. It's a great place to start if you're trying to do something that seems really hard with ggplot2.
|
||||
The absolute best place to learn more is the ggplot2 book: [_ggplot2: Elegant graphics for data analysis_](https://amzn.com/331924275X). It goes into much more depth about the underlying theory, and has many more examples of how to combine the individual pieces to solve practical problems. Unfortunately the book is not available online for free, although you can find the source code at <https://github.com/hadley/ggplot2-book>.
|
||||
|
||||
Another great resource is the ggplot2 extensions guide <http://www.ggplot2-exts.org/>. This site lists many of the packages that extend ggplot2 with new geoms and scales. It's a great place to start if you're trying to do something that seems really hard with ggplot2.
|
||||
|
|
|
@ -4,9 +4,9 @@
|
|||
|
||||
In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order.
|
||||
|
||||
Historically, factors were much easier to work with than characters, so many functions in base R automatically convert characters to factors. That means that factors often crop up in places where they're not actually helpful. Fortunately, you don't need to worry about that in the tidyverse, and can focus on situations where factors are genuinely useful.
|
||||
Historically, factors were much easier to work with than characters. As a result, many of the functions in base R automatically convert characters to factors. This means that factors often crop up in places where they're not actually helpful. Fortunately, you don't need to worry about that in the tidyverse, and can focus on situations where factors are genuinely useful.
|
||||
|
||||
For more historical context on factors, I reccommed [_stringsAsFactors: An unauthorized biography_](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng, and [_stringsAsFactors = \<sigh\>_](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley.
|
||||
For more historical context on factors, I recommended [_stringsAsFactors: An unauthorized biography_](http://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/) by Roger Peng, and [_stringsAsFactors = \<sigh\>_](http://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh) by Thomas Lumley.
|
||||
|
||||
|
||||
### Prerequisites
|
||||
|
@ -100,7 +100,7 @@ When working with factors, the two most common operations are changing the order
|
|||
default bar chart hard to understand? How could you improve the plot?
|
||||
|
||||
1. What is the most common `religion` in this survey? What's the most
|
||||
comomn `partyid`?
|
||||
common `partyid`?
|
||||
|
||||
1. Which `religion` does `denom` (denomination) apply to? How can you find
|
||||
out with a table? How can you find out with a visualisation?
|
||||
|
@ -217,7 +217,7 @@ More powerful than changing the orders of the levels is changing their values. T
|
|||
gss_cat %>% count(partyid)
|
||||
```
|
||||
|
||||
The levels are terse and inconstent. Let's tweak them to be longer and use a parallel construction.
|
||||
The levels are terse and inconsistent. Let's tweak them to be longer and use a parallel construction.
|
||||
|
||||
```{r}
|
||||
gss_cat %>%
|
||||
|
|
Loading…
Reference in New Issue