Fix code language
This commit is contained in:
@@ -13,7 +13,7 @@ Introduction</h1>
|
||||
Prerequisites</h2>
|
||||
<p>This chapter focuses on ggplot2, one of the core packages in the tidyverse. To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running this code:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">library(tidyverse)
|
||||
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
|
||||
#> ── Attaching packages ──────────────────────────────────── tidyverse 1.3.2 ──
|
||||
#> ✔ ggplot2 3.4.0.9000 ✔ purrr 0.9000.0.9000
|
||||
#> ✔ tibble 3.1.8 ✔ dplyr 1.0.99.9000
|
||||
@@ -26,7 +26,7 @@ Prerequisites</h2>
|
||||
<p>That one line of code loads the core tidyverse; packages which you will use in almost every data analysis. It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).</p>
|
||||
<p>If you run this code and get the error message “there is no package called ‘tidyverse’”, you’ll need to first install it, then run <code><a href="https://rdrr.io/r/base/library.html">library()</a></code> once again.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">install.packages("tidyverse")
|
||||
<pre data-type="programlisting" data-code-language="r">install.packages("tidyverse")
|
||||
library(tidyverse)</pre>
|
||||
</div>
|
||||
<p>You only need to install a package once, but you need to reload it every time you start a new session.</p>
|
||||
@@ -43,7 +43,7 @@ First steps</h1>
|
||||
The<code>mpg</code> data frame</h2>
|
||||
<p>You can test your answer with the <code>mpg</code> <strong>data frame</strong> found in ggplot2 (a.k.a. <code><a href="https://ggplot2.tidyverse.org/reference/mpg.html">ggplot2::mpg</a></code>). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). <code>mpg</code> contains observations collected by the US Environmental Protection Agency on 38 car models.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">mpg
|
||||
<pre data-type="programlisting" data-code-language="r">mpg
|
||||
#> # A tibble: 234 × 11
|
||||
#> manufacturer model displ year cyl trans drv cty hwy fl class
|
||||
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
|
||||
@@ -66,7 +66,7 @@ The<code>mpg</code> data frame</h2>
|
||||
Creating a ggplot</h2>
|
||||
<p>To plot <code>mpg</code>, run this code to put <code>displ</code> on the x-axis and <code>hwy</code> on the y-axis:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-5-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars that shows a negative association." width="576"/></p>
|
||||
@@ -121,7 +121,7 @@ Aesthetic mappings</h1>
|
||||
</div>
|
||||
<p>You can convey information about your data by mapping the aesthetics in your plot to the variables in your dataset. For example, you can map the colors of your points to the <code>class</code> variable to reveal the class of each car.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy, color = class))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-9-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars that shows a negative association. The points representing each car are colored according to the class of the car. The legend on the right of the plot shows the mapping between colors and levels of the class variable: 2seater, compact, midsize, minivan, pickup, or suv." width="576"/></p>
|
||||
@@ -132,7 +132,7 @@ Aesthetic mappings</h1>
|
||||
<p>The colors reveal that many of the unusual points (with engine size greater than 5 liters and highway fuel efficiency greater than 20 miles per gallon) are two-seater cars. These cars don’t seem like hybrids, and are, in fact, sports cars! Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage. In hindsight, these cars were unlikely to be hybrids since they have large engines.</p>
|
||||
<p>In the above example, we mapped <code>class</code> to the color aesthetic, but we could have mapped <code>class</code> to the size aesthetic in the same way. In this case, the exact size of each point would reveal its class affiliation. We get a <em>warning</em> here: mapping an unordered variable (<code>class</code>) to an ordered aesthetic (<code>size</code>) is generally not a good idea because it implies a ranking that does not in fact exist.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy, size = class))
|
||||
#> Warning: Using size for a discrete variable is not advised.</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -141,7 +141,7 @@ Aesthetic mappings</h1>
|
||||
</div>
|
||||
<p>Similarly, we could have mapped <code>class</code> to the <em>alpha</em> aesthetic, which controls the transparency of the points, or to the <em>shape</em> aesthetic, which controls the shape of the points.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit"># Left
|
||||
<pre data-type="programlisting" data-code-language="r"># Left
|
||||
ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy, alpha = class))
|
||||
|
||||
@@ -164,7 +164,7 @@ ggplot(data = mpg) +
|
||||
<p>Once you map an aesthetic, ggplot2 takes care of the rest. It selects a reasonable scale to use with the aesthetic, and it constructs a legend that explains the mapping between levels and values. For x and y aesthetics, ggplot2 does not create a legend, but it creates an axis line with tick marks and a label. The axis line acts as a legend; it explains the mapping between locations and values.</p>
|
||||
<p>You can also <em>set</em> the aesthetic properties of your geom manually. For example, we can make all of the points in our plot blue:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-12-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars that shows a negative association. All points are blue." width="576"/></p>
|
||||
@@ -189,7 +189,7 @@ Exercises</h2>
|
||||
<ol type="1"><li>
|
||||
<p>What’s gone wrong with this code? Why are the points not blue?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-14-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars that shows a negative association. All points are red and the legend shows a red point that is mapped to the word blue." width="576"/></p>
|
||||
@@ -210,7 +210,7 @@ Common problems</h1>
|
||||
<p>As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. We have all been writing R code for years, but every day we still write code that doesn’t work!</p>
|
||||
<p>Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. Make sure that every <code>(</code> is matched with a <code>)</code> and every <code>"</code> is paired with another <code>"</code>. Sometimes you’ll run the code and nothing happens. Check the left-hand of your console: if it’s a <code>+</code>, it means that R doesn’t think you’ve typed a complete expression and it’s waiting for you to finish it. In this case, it’s usually easy to start from scratch again by pressing ESCAPE to abort processing the current command.</p>
|
||||
<p>One common problem when creating ggplot2 graphics is to put the <code>+</code> in the wrong place: it has to come at the end of the line, not the start. In other words, make sure you haven’t accidentally written code like this:</p>
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg)
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg)
|
||||
+ geom_point(mapping = aes(x = displ, y = hwy))</pre>
|
||||
<p>If you’re still stuck, try the help. You can get help about any R function by running <code>?function_name</code> in the console, or selecting the function name and pressing F1 in RStudio. Don’t worry if the help doesn’t seem that helpful - instead skip down to the examples and look for code that matches what you’re trying to do.</p>
|
||||
<p>If that doesn’t help, carefully read the error message. Sometimes the answer will be buried there! But when you’re new to R, the answer might be in the error message but you don’t yet know how to understand it. Another great tool is Google: try googling the error message, as it’s likely someone else has had the same problem, and has gotten help online.</p>
|
||||
@@ -222,7 +222,7 @@ Facets</h1>
|
||||
<p>One way to add additional variables to a plot is by mapping them to an aesthetic. Another way, which is particularly useful for categorical variables, is to split your plot into <strong>facets</strong>, subplots that each display one subset of the data.</p>
|
||||
<p>To facet your plot by a single variable, use <code><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap()</a></code>. The first argument of <code><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap()</a></code> is a formula<span data-type="footnote">Here “formula” is the name of the type of thing created by <code>~</code>, not a synonym for “equation”.</span>, which you create with <code>~</code> followed by a variable name. The variable that you pass to <code><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap()</a></code> should be discrete.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_wrap(~cyl)</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -231,7 +231,7 @@ Facets</h1>
|
||||
</div>
|
||||
<p>To facet your plot with the combination of two variables, switch from <code><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap()</a></code> to <code><a href="https://ggplot2.tidyverse.org/reference/facet_grid.html">facet_grid()</a></code>. The first argument of <code><a href="https://ggplot2.tidyverse.org/reference/facet_grid.html">facet_grid()</a></code> is also a formula, but now it’s a double sided formula: <code>rows ~ cols</code>.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_grid(drv ~ cyl)</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -246,7 +246,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>What do the empty cells in plot with <code>facet_grid(drv ~ cyl)</code> mean? How do they relate to this plot?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = drv, y = cyl))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-17-1.png" alt="Scatterplot of number of cycles versus type of drive train of cars. The plot shows that there are no cars with 5 cylinders that are 4 wheel drive or with 4 or 5 cylinders that are front wheel drive." width="576"/></p>
|
||||
@@ -256,7 +256,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>What plots does the following code make? What does <code>.</code> do?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_grid(drv ~ .)
|
||||
|
||||
@@ -268,7 +268,7 @@ ggplot(data = mpg) +
|
||||
<li>
|
||||
<p>Take the first faceted plot in this section:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_wrap(~ class, nrow = 2)</pre>
|
||||
</div>
|
||||
@@ -278,7 +278,7 @@ ggplot(data = mpg) +
|
||||
<li>
|
||||
<p>Which of the following two plots makes it easier to compare engine size (<code>displ</code>) across cars with different drive trains? What does this say about when to place a faceting variable across rows or columns?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_grid(drv ~ .)
|
||||
|
||||
@@ -296,7 +296,7 @@ ggplot(data = mpg) +
|
||||
<li>
|
||||
<p>Recreate this plot using <code><a href="https://ggplot2.tidyverse.org/reference/facet_wrap.html">facet_wrap()</a></code> instead of <code><a href="https://ggplot2.tidyverse.org/reference/facet_grid.html">facet_grid()</a></code>. How do the positions of the facet labels change?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
facet_grid(drv ~ .)</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -325,7 +325,7 @@ Geometric objects</h1>
|
||||
<p>A <strong>geom</strong> is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on. Scatterplots break the trend; they use the point geom. As we see above, you can use different geoms to plot the same data. The plot on the left uses the point geom, and the plot on the right uses the smooth geom, a smooth line fitted to the data.</p>
|
||||
<p>To change the geom in your plot, change the geom function that you add to <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code>. For instance, to make the plots above, you can use this code:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit"># Left
|
||||
<pre data-type="programlisting" data-code-language="r"># Left
|
||||
ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy))
|
||||
|
||||
@@ -335,7 +335,7 @@ ggplot(data = mpg) +
|
||||
</div>
|
||||
<p>Every geom function in ggplot2 takes a <code>mapping</code> argument. However, not every aesthetic works with every geom. You could set the shape of a point, but you couldn’t set the “shape” of a line. On the other hand, you <em>could</em> set the linetype of a line. <code><a href="https://ggplot2.tidyverse.org/reference/geom_smooth.html">geom_smooth()</a></code> will draw a different line, with a different linetype, for each unique value of the variable that you map to linetype.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-24-1.png" alt="A plot of highway fuel efficiency versus engine size of cars. The data are represented with smooth curves, which use a different line type (solid, dashed, or long dashed) for each type of drive train. Confidence intervals around the smooth curves are also displayed." width="576"/></p>
|
||||
@@ -352,7 +352,7 @@ ggplot(data = mpg) +
|
||||
<p>ggplot2 provides more than 40 geoms, and extension packages provide even more (see <a href="https://exts.ggplot2.tidyverse.org/gallery/" class="uri">https://exts.ggplot2.tidyverse.org/gallery/</a> for a sampling). The best way to get a comprehensive overview is the ggplot2 cheatsheet, which you can find at <a href="https://rstudio.com/resources/cheatsheets" class="uri">https://rstudio.com/resources/cheatsheets</a>. To learn more about any single geom, use the help (e.g. <code><a href="https://ggplot2.tidyverse.org/reference/geom_smooth.html">?geom_smooth</a></code>).</p>
|
||||
<p>Many geoms, like <code><a href="https://ggplot2.tidyverse.org/reference/geom_smooth.html">geom_smooth()</a></code>, use a single geometric object to display multiple rows of data. For these geoms, you can set the <code>group</code> aesthetic to a categorical variable to draw multiple objects. ggplot2 will draw a separate object for each unique value of the grouping variable. In practice, ggplot2 will automatically group the data for these geoms whenever you map an aesthetic to a discrete variable (as in the <code>linetype</code> example). It is convenient to rely on this feature because the <code>group</code> aesthetic by itself does not add a legend or distinguishing features to the geoms.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_smooth(mapping = aes(x = displ, y = hwy))
|
||||
|
||||
ggplot(data = mpg) +
|
||||
@@ -379,7 +379,7 @@ ggplot(data = mpg) +
|
||||
</div>
|
||||
<p>To display multiple geoms in the same plot, add multiple geom functions to <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy)) +
|
||||
geom_smooth(mapping = aes(x = displ, y = hwy))</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -388,13 +388,13 @@ ggplot(data = mpg) +
|
||||
</div>
|
||||
<p>This, however, introduces some duplication in our code. Imagine if you wanted to change the y-axis to display <code>cty</code> instead of <code>hwy</code>. You’d need to change the variable in two places, and you might forget to update one. You can avoid this type of repetition by passing a set of mappings to <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code>. ggplot2 will treat these mappings as global mappings that apply to each geom in the graph. In other words, this code will produce the same plot as the previous code:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
geom_point() +
|
||||
geom_smooth()</pre>
|
||||
</div>
|
||||
<p>If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings <em>for that layer only</em>. This makes it possible to display different aesthetics in different layers.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
geom_point(mapping = aes(color = class)) +
|
||||
geom_smooth()</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -403,7 +403,7 @@ ggplot(data = mpg) +
|
||||
</div>
|
||||
<p>You can use the same idea to specify different <code>data</code> for each layer. Here, our smooth line displays just a subset of the <code>mpg</code> dataset, the subcompact cars. The local data argument in <code><a href="https://ggplot2.tidyverse.org/reference/geom_smooth.html">geom_smooth()</a></code> overrides the global data argument in <code><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot()</a></code> for that layer only.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
geom_point(mapping = aes(color = class)) +
|
||||
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)</pre>
|
||||
<div class="cell-output-display">
|
||||
@@ -419,7 +419,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
|
||||
geom_point() +
|
||||
geom_smooth(se = FALSE)</pre>
|
||||
</div>
|
||||
@@ -427,7 +427,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>Earlier in this chapter we used <code>show.legend</code> without explaining it:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_smooth(
|
||||
mapping = aes(x = displ, y = hwy, color = drv),
|
||||
show.legend = FALSE
|
||||
@@ -439,7 +439,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>Will these two graphs look different? Why/why not?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
|
||||
geom_point() +
|
||||
geom_smooth()
|
||||
|
||||
@@ -485,7 +485,7 @@ ggplot() +
|
||||
Statistical transformations</h1>
|
||||
<p>Next, let’s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn with <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar()</a></code> or <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_col()</a></code>. The following chart displays the total number of diamonds in the <code>diamonds</code> dataset, grouped by <code>cut</code>. The <code>diamonds</code> dataset is in the ggplot2 package and contains information on ~54,000 diamonds, including the <code>price</code>, <code>carat</code>, <code>color</code>, <code>clarity</code>, and <code>cut</code> of each diamond. The chart shows that more diamonds are available with high quality cuts than with low quality cuts.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-35-1.png" alt="Bar chart of number of each cut of diamond. There are roughly 1500 Fair, 5000 Good, 12000 Very Good, 14000 Premium, and 22000 Ideal cut diamonds." width="576"/></p>
|
||||
@@ -507,7 +507,7 @@ Statistical transformations</h1>
|
||||
<p>You can learn which stat a geom uses by inspecting the default value for the <code>stat</code> argument. For example, <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">?geom_bar</a></code> shows that the default value for <code>stat</code> is “count”, which means that <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar()</a></code> uses <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">stat_count()</a></code>. <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">stat_count()</a></code> is documented on the same page as <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar()</a></code>. If you scroll down, the section called “Computed variables” explains that it computes two new variables: <code>count</code> and <code>prop</code>.</p>
|
||||
<p>You can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">stat_count()</a></code> instead of <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
stat_count(mapping = aes(x = cut))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-37-1.png" alt="Bar chart of number of each cut of diamond. There are roughly 1500 Fair, 5000 Good, 12000 Very Good, 14000 Premium, and 22000 Ideal cut diamonds." width="576"/></p>
|
||||
@@ -517,7 +517,7 @@ Statistical transformations</h1>
|
||||
<ol type="1"><li>
|
||||
<p>You might want to override the default stat. In the code below, we change the stat of <code><a href="https://ggplot2.tidyverse.org/reference/geom_bar.html">geom_bar()</a></code> from count (the default) to identity. This lets me map the height of the bars to the raw values of a <span class="math inline">\(y\)</span> variable. Unfortunately when people talk about bar charts casually, they might be referring to this type of bar chart, where the height of the bar is already present in the data, or the previous bar chart where the height of the bar is generated by counting rows.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">demo <- tribble(
|
||||
<pre data-type="programlisting" data-code-language="r">demo <- tribble(
|
||||
~cut, ~freq,
|
||||
"Fair", 1610,
|
||||
"Good", 4906,
|
||||
@@ -537,7 +537,7 @@ ggplot(data = demo) +
|
||||
<li>
|
||||
<p>You might want to override the default mapping from transformed variables to aesthetics. For example, you might want to display a bar chart of proportions, rather than counts:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, y = after_stat(prop), group = 1))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-39-1.png" alt="Bar chart of proportion of each cut of diamond. Roughly, Fair diamonds make up 0.03, Good 0.09, Very Good 0.22, Premium 26, and Ideal 0.40." width="576"/></p>
|
||||
@@ -548,7 +548,7 @@ ggplot(data = demo) +
|
||||
<li>
|
||||
<p>You might want to draw greater attention to the statistical transformation in your code. For example, you might use <code><a href="https://ggplot2.tidyverse.org/reference/stat_summary.html">stat_summary()</a></code>, which summarizes the y values for each unique x value, to draw attention to the summary that you’re computing:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
stat_summary(
|
||||
mapping = aes(x = cut, y = depth),
|
||||
fun.min = min,
|
||||
@@ -572,7 +572,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>In our proportion bar chart, we need to set <code>group = 1</code>. Why? In other words, what is the problem with these two graphs?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, y = after_stat(prop)))
|
||||
ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, fill = color, y = after_stat(prop)))</pre>
|
||||
@@ -586,7 +586,7 @@ ggplot(data = diamonds) +
|
||||
Position adjustments</h1>
|
||||
<p>There’s one more piece of magic associated with bar charts. You can color a bar chart using either the <code>color</code> aesthetic, or, more usefully, <code>fill</code>:</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, color = cut))
|
||||
ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, fill = cut))</pre>
|
||||
@@ -603,7 +603,7 @@ ggplot(data = diamonds) +
|
||||
</div>
|
||||
<p>Note what happens if you map the fill aesthetic to another variable, like <code>clarity</code>: the bars are automatically stacked. Each colored rectangle represents a combination of <code>cut</code> and <code>clarity</code>.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, fill = clarity))</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-43-1.png" alt="Segmented bar chart of cut of diamonds, where each bar is filled with colors for the levels of clarity. Heights of the bars correspond to the number of diamonds in each cut category, and heights of the colored segments are proportional to the number of diamonds with a given clarity level within a given cut level." width="576"/></p>
|
||||
@@ -613,7 +613,7 @@ ggplot(data = diamonds) +
|
||||
<ul><li>
|
||||
<p><code>position = "identity"</code> will place each object exactly where it falls in the context of the graph. This is not very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars slightly transparent by setting <code>alpha</code> to a small value, or completely transparent by setting <code>fill = NA</code>.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
|
||||
geom_bar(alpha = 1/5, position = "identity")
|
||||
ggplot(data = diamonds, mapping = aes(x = cut, color = clarity)) +
|
||||
geom_bar(fill = NA, position = "identity")</pre>
|
||||
@@ -633,7 +633,7 @@ ggplot(data = diamonds, mapping = aes(x = cut, color = clarity)) +
|
||||
<li>
|
||||
<p><code>position = "fill"</code> works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-45-1.png" alt="Segmented bar chart of cut of diamonds, where each bar is filled with colors for the levels of clarity. Height of each bar is 1 and heights of the colored segments are proportional to the proportion of diamonds with a given clarity level within a given cut level." width="576"/></p>
|
||||
@@ -643,7 +643,7 @@ ggplot(data = diamonds, mapping = aes(x = cut, color = clarity)) +
|
||||
<li>
|
||||
<p><code>position = "dodge"</code> places overlapping objects directly <em>beside</em> one another. This makes it easier to compare individual values.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = diamonds) +
|
||||
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-46-1.png" alt="Dodged bar chart of cut of diamonds. Dodged bars are grouped by levels of cut (fair, good, very good, premium, and ideal). In each group there are eight bars, one for each level of clarity, and filled with a different color for each level. Heights of these bars represent the number of diamonds with a given level of cut and clarity." width="576"/></p>
|
||||
@@ -659,7 +659,7 @@ ggplot(data = diamonds, mapping = aes(x = cut, color = clarity)) +
|
||||
<p>The underlying values of <code>hwy</code> and <code>displ</code> are rounded so the points appear on a grid and many points overlap each other. This problem is known as <strong>overplotting</strong>. This arrangement makes it difficult to see the distribution of the data. Are the data points spread equally throughout the graph, or is there one special combination of <code>hwy</code> and <code>displ</code> that contains 109 values?</p>
|
||||
<p>You can avoid this gridding by setting the position adjustment to “jitter”. <code>position = "jitter"</code> adds a small amount of random noise to each point. This spreads the points out because no two points are likely to receive the same amount of random noise.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg) +
|
||||
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-48-1.png" alt="Jittered scatterplot of highway fuel efficiency versus engine size of cars. The plot shows a negative association." width="576"/></p>
|
||||
@@ -674,7 +674,7 @@ Exercises</h2>
|
||||
<ol type="1"><li>
|
||||
<p>What is the problem with this plot? How could you improve it?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
||||
geom_point()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-49-1.png" alt="Scatterplot of highway fuel efficiency versus city fuel efficiency of cars that shows a positive association. The number of points visible in this plot is less than the number of points in the dataset." width="576"/></p>
|
||||
@@ -694,7 +694,7 @@ Coordinate systems</h1>
|
||||
<ul><li>
|
||||
<p><code><a href="https://ggplot2.tidyverse.org/reference/coord_flip.html">coord_flip()</a></code> switches the x and y axes. This is useful (for example), if you want horizontal boxplots. It’s also useful for long labels: it’s hard to get them to fit without overlapping on the x-axis.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
|
||||
geom_boxplot()
|
||||
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
|
||||
geom_boxplot() +
|
||||
@@ -712,7 +712,7 @@ ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
|
||||
</div>
|
||||
<p>However, note that you can achieve the same result by flipping the aesthetic mappings of the two variables.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(y = class, x = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(y = class, x = hwy)) +
|
||||
geom_boxplot()</pre>
|
||||
<div class="cell-output-display">
|
||||
<p><img src="data-visualize_files/figure-html/unnamed-chunk-51-1.png" alt="Side-by-side box plots of highway fuel efficiency of cars. A separate box plot is drawn along the y-axis for cars in each level of class (2seater, compact, midsize, minivan, pickup, subcompact, and suv)." width="576"/></p>
|
||||
@@ -722,7 +722,7 @@ ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
|
||||
<li>
|
||||
<p><code><a href="https://ggplot2.tidyverse.org/reference/coord_map.html">coord_quickmap()</a></code> sets the aspect ratio correctly for maps. This is very important if you’re plotting spatial data with ggplot2. We don’t have the space to discuss maps in this book, but you can learn more in the <a href="https://ggplot2-book.org/maps.html">Maps chapter</a> of <em>ggplot2: Elegant graphics for data analysis</em>.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">nz <- map_data("nz")
|
||||
<pre data-type="programlisting" data-code-language="r">nz <- map_data("nz")
|
||||
|
||||
ggplot(nz, aes(long, lat, group = group)) +
|
||||
geom_polygon(fill = "white", color = "black")
|
||||
@@ -745,7 +745,7 @@ ggplot(nz, aes(long, lat, group = group)) +
|
||||
<li>
|
||||
<p><code><a href="https://ggplot2.tidyverse.org/reference/coord_polar.html">coord_polar()</a></code> uses polar coordinates. Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.</p>
|
||||
<div>
|
||||
<pre data-type="programlisting" data-code-language="downlit">bar <- ggplot(data = diamonds) +
|
||||
<pre data-type="programlisting" data-code-language="r">bar <- ggplot(data = diamonds) +
|
||||
geom_bar(
|
||||
mapping = aes(x = cut, fill = cut),
|
||||
show.legend = FALSE,
|
||||
@@ -778,7 +778,7 @@ Exercises</h2>
|
||||
<li>
|
||||
<p>What does the plot below tell you about the relationship between city and highway mpg? Why is <code><a href="https://ggplot2.tidyverse.org/reference/coord_fixed.html">coord_fixed()</a></code> important? What does <code><a href="https://ggplot2.tidyverse.org/reference/geom_abline.html">geom_abline()</a></code> do?</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
||||
<pre data-type="programlisting" data-code-language="r">ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
|
||||
geom_point() +
|
||||
geom_abline() +
|
||||
coord_fixed()</pre>
|
||||
|
||||
Reference in New Issue
Block a user