Fix code language

This commit is contained in:
Hadley Wickham
2022-11-18 11:26:25 -06:00
parent 69b4597f3b
commit 868a35ca71
29 changed files with 912 additions and 907 deletions

View File

@@ -12,7 +12,7 @@ Introduction</h1>
Prerequisites</h2>
<p>In this chapter, well focus once again on ggplot2. Well also use a little dplyr for data manipulation, and a few ggplot2 extension packages, including <strong>ggrepel</strong> and <strong>patchwork</strong>. Rather than loading those extensions here, well refer to their functions explicitly, using the <code>::</code> notation. This will help make it clear which functions are built into ggplot2, and which come from other packages. Dont forget youll need to install those packages with <code><a href="https://rdrr.io/r/utils/install.packages.html">install.packages()</a></code> if you dont already have them.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">library(tidyverse)</pre>
<pre data-type="programlisting" data-code-language="r">library(tidyverse)</pre>
</div>
</section>
</section>
@@ -22,7 +22,7 @@ Prerequisites</h2>
Label</h1>
<p>The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You add labels with the <code><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs()</a></code> function. This example adds a plot title:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")</pre>
@@ -35,7 +35,7 @@ Label</h1>
<ul><li><p><code>subtitle</code> adds additional detail in a smaller font beneath the title.</p></li>
<li><p><code>caption</code> adds text at the bottom right of the plot, often used to describe the source of the data.</p></li>
</ul><div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
@@ -49,7 +49,7 @@ Label</h1>
</div>
<p>You can also use <code><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs()</a></code> to replace the axis and legend titles. Its usually a good idea to replace short variable names with more detailed descriptions, and to include the units.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
labs(
@@ -63,7 +63,7 @@ Label</h1>
</div>
<p>Its possible to use mathematical equations instead of text strings. Just switch <code>""</code> out for <code><a href="https://rdrr.io/r/base/substitute.html">quote()</a></code> and read about the available options in <code><a href="https://rdrr.io/r/grDevices/plotmath.html">?plotmath</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">df &lt;- tibble(
<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
x = runif(10),
y = runif(10)
)
@@ -100,7 +100,7 @@ Annotations</h1>
<p>In addition to labelling major components of your plot, its often useful to label individual observations or groups of observations. The first tool you have at your disposal is <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code>. <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code> is similar to <code><a href="https://ggplot2.tidyverse.org/reference/geom_point.html">geom_point()</a></code>, but it has an additional aesthetic: <code>label</code>. This makes it possible to add textual labels to your plots.</p>
<p>There are two possible sources of labels. First, you might have a tibble that provides labels. The plot below isnt terribly useful, but it illustrates a useful approach: pull out the most efficient car in each class with dplyr, and then label it on the plot:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">best_in_class &lt;- mpg |&gt;
<pre data-type="programlisting" data-code-language="r">best_in_class &lt;- mpg |&gt;
group_by(class) |&gt;
filter(row_number(desc(hwy)) == 1)
@@ -113,7 +113,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>This is hard to read because the labels overlap with each other, and with the points. We can make things a little better by switching to <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_label()</a></code> which draws a rectangle behind the text. We also use the <code>nudge_y</code> parameter to move the labels slightly above the corresponding points:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)</pre>
<div class="cell-output-display">
@@ -122,7 +122,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>That helps a bit, but if you look closely in the top-left hand corner, youll notice that there are two labels practically on top of each other. This happens because the highway mileage and displacement for the best cars in the compact and subcompact categories are exactly the same. Theres no way that we can fix these by applying the same transformation for every label. Instead, we can use the <strong>ggrepel</strong> package by Kamil Slowikowski. This useful package will automatically adjust labels so that they dont overlap:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)</pre>
@@ -133,7 +133,7 @@ ggplot(mpg, aes(displ, hwy)) +
<p>Note another handy technique used here: we added a second layer of large, hollow points to highlight the labelled points.</p>
<p>You can sometimes use the same idea to replace the legend with labels placed directly on the plot. Its not wonderful for this plot, but it isnt too bad. (<code>theme(legend.position = "none"</code>) turns the legend off — well talk about it more shortly.)</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">class_avg &lt;- mpg |&gt;
<pre data-type="programlisting" data-code-language="r">class_avg &lt;- mpg |&gt;
group_by(class) |&gt;
summarise(
displ = median(displ),
@@ -155,7 +155,7 @@ ggplot(mpg, aes(displ, hwy, colour = class)) +
</div>
<p>Alternatively, you might just want to add a single label to the plot, but youll still need to create a data frame. Often, you want the label in the corner of the plot, so its convenient to create a new data frame using <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> to compute the maximum values of x and y.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">label_info &lt;- mpg |&gt;
<pre data-type="programlisting" data-code-language="r">label_info &lt;- mpg |&gt;
summarise(
displ = max(displ),
hwy = max(hwy),
@@ -171,7 +171,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>If you want to place the text exactly on the borders of the plot, you can use <code>+Inf</code> and <code>-Inf</code>. Since were no longer computing the positions from <code>mpg</code>, we can use <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> to create the data frame:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">label_info &lt;- tibble(
<pre data-type="programlisting" data-code-language="r">label_info &lt;- tibble(
displ = Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
@@ -186,7 +186,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>In these examples, we manually broke the label up into lines using <code>"\n"</code>. Another approach is to use <code><a href="https://stringr.tidyverse.org/reference/str_wrap.html">stringr::str_wrap()</a></code> to automatically add line breaks, given the number of characters you want per line:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">"Increasing engine size is related to decreasing fuel economy." |&gt;
<pre data-type="programlisting" data-code-language="r">"Increasing engine size is related to decreasing fuel economy." |&gt;
str_wrap(width = 40) |&gt;
writeLines()
#&gt; Increasing engine size is related to
@@ -223,12 +223,12 @@ Exercises</h2>
Scales</h1>
<p>The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. For example, when you type:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))</pre>
</div>
<p>ggplot2 automatically adds default scales behind the scenes:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
@@ -244,7 +244,7 @@ Scales</h1>
Axis ticks and legend keys</h2>
<p>There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: <code>breaks</code> and <code>labels</code>. Breaks controls the position of the ticks, or the values associated with the keys. Labels controls the text label associated with each tick/key. The most common use of <code>breaks</code> is to override the default choice:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))</pre>
<div class="cell-output-display">
@@ -253,7 +253,7 @@ Axis ticks and legend keys</h2>
</div>
<p>You can use <code>labels</code> in the same way (a character vector the same length as <code>breaks</code>), but you can also set it to <code>NULL</code> to suppress the labels altogether. This is useful for maps, or for publishing plots where you cant share the absolute numbers.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_x_continuous(labels = NULL) +
scale_y_continuous(labels = NULL)</pre>
@@ -264,7 +264,7 @@ Axis ticks and legend keys</h2>
<p>You can also use <code>breaks</code> and <code>labels</code> to control the appearance of legends. Collectively axes and legends are called <strong>guides</strong>. Axes are used for x and y aesthetics; legends are used for everything else.</p>
<p>Another use of <code>breaks</code> is when you have relatively few data points and want to highlight exactly where the observations occur. For example, take this plot that shows when each US president started and ended their term.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">presidential |&gt;
<pre data-type="programlisting" data-code-language="r">presidential |&gt;
mutate(id = 33 + row_number()) |&gt;
ggplot(aes(start, id)) +
geom_point() +
@@ -285,7 +285,7 @@ Legend layout</h2>
<p>You will most often use <code>breaks</code> and <code>labels</code> to tweak the axes. While they both also work for legends, there are a few other techniques you are more likely to use.</p>
<p>To control the overall position of the legend, you need to use a <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme()</a></code> setting. Well come back to themes at the end of the chapter, but in brief, they control the non-data parts of the plot. The theme setting <code>legend.position</code> controls where the legend is drawn:</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">base &lt;- ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">base &lt;- ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
base + theme(legend.position = "left")
@@ -314,7 +314,7 @@ base + theme(legend.position = "right") # the default</pre>
<p>You can also use <code>legend.position = "none"</code> to suppress the display of the legend altogether.</p>
<p>To control the display of individual legends, use <code><a href="https://ggplot2.tidyverse.org/reference/guides.html">guides()</a></code> along with <code><a href="https://ggplot2.tidyverse.org/reference/guide_legend.html">guide_legend()</a></code> or <code><a href="https://ggplot2.tidyverse.org/reference/guide_colourbar.html">guide_colorbar()</a></code>. The following example shows two important settings: controlling the number of rows the legend uses with <code>nrow</code>, and overriding one of the aesthetics to make the points bigger. This is particularly useful if you have used a low <code>alpha</code> to display many points on a plot.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
theme(legend.position = "bottom") +
@@ -332,7 +332,7 @@ Replacing a scale</h2>
<p>Instead of just tweaking the details a little, you can instead replace the scale altogether. There are two types of scales youre mostly likely to want to switch out: continuous position scales and colour scales. Fortunately, the same principles apply to all the other aesthetics, so once youve mastered position and colour, youll be able to quickly pick up other scale replacements.</p>
<p>Its very useful to plot transformations of your variable. For example, as weve seen in <a href="#chp-diamond-prices" data-type="xref">#chp-diamond-prices</a> its easier to see the precise relationship between <code>carat</code> and <code>price</code> if we log transform them:</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">ggplot(diamonds, aes(carat, price)) +
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(carat, price)) +
geom_bin2d()
ggplot(diamonds, aes(log10(carat), log10(price))) +
@@ -350,7 +350,7 @@ ggplot(diamonds, aes(log10(carat), log10(price))) +
</div>
<p>However, the disadvantage of this transformation is that the axes are now labelled with the transformed values, making it hard to interpret the plot. Instead of doing the transformation in the aesthetic mapping, we can instead do it with the scale. This is visually identical, except the axes are labelled on the original data scale.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(diamonds, aes(carat, price)) +
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(carat, price)) +
geom_bin2d() +
scale_x_log10() +
scale_y_log10()</pre>
@@ -360,7 +360,7 @@ ggplot(diamonds, aes(log10(carat), log10(price))) +
</div>
<p>Another scale that is frequently customized is colour. The default categorical scale picks colors that are evenly spaced around the colour wheel. Useful alternatives are the ColorBrewer scales which have been hand tuned to work better for people with common types of colour blindness. The two plots below look similar, but there is enough difference in the shades of red and green that the dots on the right can be distinguished even by people with red-green colour blindness.</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = drv))
ggplot(mpg, aes(displ, hwy)) +
@@ -379,7 +379,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>Dont forget simpler techniques. If there are just a few colors, you can add a redundant shape mapping. This will also help ensure your plot is interpretable in black and white.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = drv, shape = drv)) +
scale_colour_brewer(palette = "Set1")</pre>
<div class="cell-output-display">
@@ -397,7 +397,7 @@ ggplot(mpg, aes(displ, hwy)) +
</div>
<p>When you have a predefined mapping between values and colors, use <code><a href="https://ggplot2.tidyverse.org/reference/scale_manual.html">scale_colour_manual()</a></code>. For example, if we map presidential party to colour, we want to use the standard mapping of red for Republicans and blue for Democrats:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">presidential |&gt;
<pre data-type="programlisting" data-code-language="r">presidential |&gt;
mutate(id = 33 + row_number()) |&gt;
ggplot(aes(start, id, colour = party)) +
geom_point() +
@@ -410,7 +410,7 @@ ggplot(mpg, aes(displ, hwy)) +
<p>For continuous colour, you can use the built-in <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_colour_gradient()</a></code> or <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_fill_gradient()</a></code>. If you have a diverging scale, you can use <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_colour_gradient2()</a></code>. That allows you to give, for example, positive and negative values different colors. Thats sometimes also useful if you want to distinguish points above or below the mean.</p>
<p>Another option is to use the viridis color scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored continuous colour schemes that are perceptible to people with various forms of colour blindness as well as perceptually uniform in both color and black and white. These scales are available as continuous (<code>c</code>), discrete (<code>d</code>), and binned (<code>b</code>) palettes in ggplot2.</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">df &lt;- tibble(
<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
@@ -455,7 +455,7 @@ Exercises</h2>
<ol type="1"><li>
<p>Why doesnt the following code override the default scale?</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(df, aes(x, y)) +
<pre data-type="programlisting" data-code-language="r">ggplot(df, aes(x, y)) +
geom_hex() +
scale_colour_gradient(low = "white", high = "red") +
coord_fixed()</pre>
@@ -473,7 +473,7 @@ Exercises</h2>
<li>
<p>Use <code>override.aes</code> to make the legend on the following plot easier to see.</p>
<div class="cell" data-fig.format="png">
<pre data-type="programlisting" data-code-language="downlit">ggplot(diamonds, aes(carat, price)) +
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1/20)</pre>
<div class="cell-output-display">
<p><img src="communicate-plots_files/figure-html/unnamed-chunk-31-1.png" style="width:50.0%"/></p>
@@ -493,7 +493,7 @@ Zooming</h1>
</li>
</ol><p>To zoom in on a region of the plot, its generally best to use <code><a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html">coord_cartesian()</a></code>. Compare the following two plots:</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, mapping = aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
@@ -516,7 +516,7 @@ mpg |&gt;
</div>
<p>You can also set the <code>limits</code> on individual scales. Reducing the limits is basically equivalent to subsetting the data. It is generally more useful if you want <em>expand</em> the limits, for example, to match scales across different plots. For example, if we extract two classes of cars and plot them separately, its difficult to compare the plots because all three scales (the x-axis, the y-axis, and the colour aesthetic) have different ranges.</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">suv &lt;- mpg |&gt; filter(class == "suv")
<pre data-type="programlisting" data-code-language="r">suv &lt;- mpg |&gt; filter(class == "suv")
compact &lt;- mpg |&gt; filter(class == "compact")
ggplot(suv, aes(displ, hwy, colour = drv)) +
@@ -537,7 +537,7 @@ ggplot(compact, aes(displ, hwy, colour = drv)) +
</div>
<p>One way to overcome this problem is to share scales across multiple plots, training the scales with the <code>limits</code> of the full data.</p>
<div>
<pre data-type="programlisting" data-code-language="downlit">x_scale &lt;- scale_x_continuous(limits = range(mpg$displ))
<pre data-type="programlisting" data-code-language="r">x_scale &lt;- scale_x_continuous(limits = range(mpg$displ))
y_scale &lt;- scale_y_continuous(limits = range(mpg$hwy))
col_scale &lt;- scale_colour_discrete(limits = unique(mpg$drv))
@@ -571,7 +571,7 @@ ggplot(compact, aes(displ, hwy, colour = drv)) +
Themes</h1>
<p>Finally, you can customize the non-data elements of your plot with a theme:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) +
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_bw()</pre>
@@ -597,7 +597,7 @@ Themes</h1>
Saving your plots</h1>
<p>There are two main ways to get your plots out of R and into your final write-up: <code><a href="https://ggplot2.tidyverse.org/reference/ggsave.html">ggsave()</a></code> and knitr. <code><a href="https://ggplot2.tidyverse.org/reference/ggsave.html">ggsave()</a></code> will save the most recent plot to disk:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">ggplot(mpg, aes(displ, hwy)) + geom_point()
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(displ, hwy)) + geom_point()
ggsave("my-plot.pdf")
#&gt; Saving 6 x 4 in image</pre>
</div>