Fix figure manipulation
This commit is contained in:
parent
78a1c12fe7
commit
89a854b7d0
|
@ -333,24 +333,24 @@ str(l$a)
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-pepper-3"><p><img src="images/pepper.jpg" style="width:25.0%" alt="A photo of a glass pepper shaker. Instead of the pepper shaker containing pepper, it contains many packets of pepper."/></p>
|
||||
<figcaption>Figure 26.1: A pepper shaker that Hadley once found in his hotel room.</figcaption>
|
||||
<figure id="fig-pepper-1"><p><img src="images/pepper.jpg" style="width:25.0%" alt="A photo of a glass pepper shaker. Instead of the pepper shaker containing pepper, it contains many packets of pepper."/></p>
|
||||
<figcaption>A pepper shaker that Hadley once found in his hotel room.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/pepper-1.jpg" style="width:25.0%" alt="A photo of the glass pepper shaker containing just one packet of pepper."/></p>
|
||||
<figcaption class="figure-caption">Figure 26.2: <code>pepper[1]</code></figcaption>
|
||||
<figure id="fig-pepper-2"><p><img src="images/pepper-1.jpg" style="width:25.0%" alt="A photo of the glass pepper shaker containing just one packet of pepper."/></p>
|
||||
<figcaption>pepper[1]<code>pepper[1]</code></figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/pepper-2.jpg" style="width:25.0%" alt="A photo of single packet of pepper."/></p>
|
||||
<figcaption class="figure-caption">Figure 26.3: <code>pepper[[1]]</code></figcaption>
|
||||
<figure id="fig-pepper-3"><p><img src="images/pepper-2.jpg" style="width:25.0%" alt="A photo of single packet of pepper."/></p>
|
||||
<figcaption>pepper[[1]]<code>pepper[[1]]</code></figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -204,8 +204,8 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-themes"><p><img src="communicate-plots_files/figure-html/fig-just-1.png" style="width:60.0%"/></p>
|
||||
<figcaption>Figure 28.1: All nine combinations of hjust and vjust.<code>hjust</code> and <code>vjust</code>.</figcaption>
|
||||
<figure id="fig-just"><p><img src="communicate-plots_files/figure-html/fig-just-1.png" style="width:60.0%"/></p>
|
||||
<figcaption>All nine combinations of hjust and vjust.<code>hjust</code> and <code>vjust</code>.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -398,8 +398,8 @@ ggplot(mpg, aes(displ, hwy)) +
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="communicate-plots_files/figure-html/fig-brewer-1.png" width="576"/></p>
|
||||
<figcaption class="figure-caption">Figure 28.2: All ColourBrewer scales.</figcaption>
|
||||
<figure id="fig-brewer"><p><img src="communicate-plots_files/figure-html/fig-brewer-1.png" width="576"/></p>
|
||||
<figcaption>All ColourBrewer scales.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -591,8 +591,8 @@ Themes</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/visualization-themes.png" alt="Eight barplots created with ggplot2, each with one of the eight built-in themes: theme_bw() - White background with grid lines, theme_light() - Light axes and grid lines, theme_classic() - Classic theme, axes but no grid lines, theme_linedraw() - Only black lines, theme_dark() - Dark background for contrast, theme_minimal() - Minimal theme, no background, theme_gray() - Gray background (default theme), theme_void() - Empty theme, only geoms are visible." width="1600"/></p>
|
||||
<figcaption class="figure-caption">Figure 28.3: The eight themes built-in to ggplot2.</figcaption>
|
||||
<figure id="fig-themes"><p><img src="images/visualization-themes.png" alt="Eight barplots created with ggplot2, each with one of the eight built-in themes: theme_bw() - White background with grid lines, theme_light() - Light axes and grid lines, theme_classic() - Classic theme, axes but no grid lines, theme_linedraw() - Only black lines, theme_dark() - Dark background for contrast, theme_minimal() - Minimal theme, no background, theme_gray() - Gray background (default theme), theme_void() - Empty theme, only geoms are visible." width="1600"/></p>
|
||||
<figcaption>The eight themes built-in to ggplot2.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -97,8 +97,8 @@ table4b # population
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-pivot-names-and-values"><p><img src="images/tidy-1.png" alt="Three panels, each representing a tidy data frame. The first panel shows that each variable is a column. The second panel shows that each observation is a row. The third panel shows that each value is a cell." width="683"/></p>
|
||||
<figcaption>Figure 6.1: The following three rules make a dataset tidy: variables are columns, observations are rows, and values are cells.</figcaption>
|
||||
<figure id="fig-tidy-structure"><p><img src="images/tidy-1.png" alt="Three panels, each representing a tidy data frame. The first panel shows that each variable is a column. The second panel shows that each observation is a row. The third panel shows that each value is a cell." width="683"/></p>
|
||||
<figcaption>The following three rules make a dataset tidy: variables are columns, observations are rows, and values are cells.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -274,8 +274,8 @@ billboard_tidy
|
|||
scale_y_reverse()</pre>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="data-tidy_files/figure-html/fig-billboard-ranks-1.png" alt="A line plot with week on the x-axis and rank on the y-axis, where each line represents a song. Most songs appear to start at a high rank, rapidly accelerate to a low rank, and then decay again. There are suprisingly few tracks in the region when week is >20 and rank is >50." width="576"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.2: A line plot showing how the rank of a song changes over time.</figcaption>
|
||||
<figure id="fig-billboard-ranks"><p><img src="data-tidy_files/figure-html/fig-billboard-ranks-1.png" alt="A line plot with week on the x-axis and rank on the y-axis, where each line represents a song. Most songs appear to start at a high rank, rapidly accelerate to a low rank, and then decay again. There are suprisingly few tracks in the region when week is >20 and rank is >50." width="576"/></p>
|
||||
<figcaption>A line plot showing how the rank of a song changes over time.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -315,8 +315,8 @@ How does pivoting work?</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/tidy-data/variables.png" alt="A diagram showing how `pivot_longer()` transforms a simple dataset, using color to highlight how the values in the `var` column ("A", "B", "C") are each repeated twice in the output because there are two columns being pivotted ("col1" and "col2")." width="469"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.3: Columns that are already variables need to be repeated, once for each column that is pivotted.</figcaption>
|
||||
<figure id="fig-pivot-variables"><p><img src="diagrams/tidy-data/variables.png" alt="A diagram showing how `pivot_longer()` transforms a simple dataset, using color to highlight how the values in the `var` column ("A", "B", "C") are each repeated twice in the output because there are two columns being pivotted ("col1" and "col2")." width="469"/></p>
|
||||
<figcaption>Columns that are already variables need to be repeated, once for each column that is pivotted.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -324,8 +324,8 @@ How does pivoting work?</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/tidy-data/column-names.png" alt="A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("col1" and "col2") become the values in a new `var` column. They are repeated three times because there were three rows in the input." width="469"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.4: The column names of pivoted columns become a new column.</figcaption>
|
||||
<figure id="fig-pivot-names"><p><img src="diagrams/tidy-data/column-names.png" alt="A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("col1" and "col2") become the values in a new `var` column. They are repeated three times because there were three rows in the input." width="469"/></p>
|
||||
<figcaption>The column names of pivoted columns become a new column.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -333,8 +333,8 @@ How does pivoting work?</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/tidy-data/cell-values.png" alt="A diagram showing how `pivot_longer()` transforms data, using color to highlight how the cell values (the numbers 1 to 6) become the values in a new `value` column. They are unwound row-by-row, so the original rows (1,2), then (3,4), then (5,6), become a column running from 1 to 6." width="469"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.5: The number of values is preserved (not repeated), but unwound row-by-row.</figcaption>
|
||||
<figure id="fig-pivot-values"><p><img src="diagrams/tidy-data/cell-values.png" alt="A diagram showing how `pivot_longer()` transforms data, using color to highlight how the cell values (the numbers 1 to 6) become the values in a new `value` column. They are unwound row-by-row, so the original rows (1,2), then (3,4), then (5,6), become a column running from 1 to 6." width="469"/></p>
|
||||
<figcaption>The number of values is preserved (not repeated), but unwound row-by-row.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -389,8 +389,8 @@ Many variables in column names</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/tidy-data/multiple-names.png" alt="A diagram that uses color to illustrate how supplying `names_sep` and multiple `names_to` creates multiple variables in the output. The input has variable names "x_1" and "y_2" which are split up by "_" to create name and number columns in the output. This is is similar case with a single `names_to`, but what would have been a single output variable is now separated into multiple variables." width="600"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.6: Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.</figcaption>
|
||||
<figure id="fig-pivot-multiple-names"><p><img src="diagrams/tidy-data/multiple-names.png" alt="A diagram that uses color to illustrate how supplying `names_sep` and multiple `names_to` creates multiple variables in the output. The input has variable names "x_1" and "y_2" which are split up by "_" to create name and number columns in the output. This is is similar case with a single `names_to`, but what would have been a single output variable is now separated into multiple variables." width="600"/></p>
|
||||
<figcaption>Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -439,8 +439,8 @@ Data and variable names in the column headers</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/tidy-data/names-and-values.png" alt="A diagram that uses color to illustrate how the special ".value" sentinel works. The input has names "x_1", "x_2", "y_1", and "y_2", and we want to use the first component ("x", "y") as a variable name and the second ("1", "2") as the value for a new "id" column." width="540"/></p>
|
||||
<figcaption class="figure-caption">Figure 6.7: Pivoting with <code>names_to = c(".value", "id")</code> splits the column names into two components: the first part determines the output column name (<code>x</code> or <code>y</code>), and the second part determines the value of the <code>id</code> column.</figcaption>
|
||||
<figure id="fig-pivot-names-and-values"><p><img src="diagrams/tidy-data/names-and-values.png" alt="A diagram that uses color to illustrate how the special ".value" sentinel works. The input has names "x_1", "x_2", "y_1", and "y_2", and we want to use the first component ("x", "y") as a variable name and the second ("1", "2") as the value for a new "id" column." width="540"/></p>
|
||||
<figcaption>Pivoting with names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.<code>names_to = c(".value", "id")</code> splits the column names into two components: the first part determines the output column name (<code>x</code> or <code>y</code>), and the second part determines the value of the <code>id</code> column.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -177,8 +177,8 @@ ggplot(data = mpg) +
|
|||
</ul><div class="cell" data-layout-align="center">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-vis-stat-bar"><p><img src="data-visualize_files/figure-html/fig-shapes-1.png" alt="Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue." width="576"/></p>
|
||||
<figcaption>Figure 2.1: R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.<code>color</code> and <code>fill</code> aesthetics. The hollow shapes (0–14) have a border determined by <code>color</code>; the solid shapes (15–20) are filled with <code>color</code>; the filled shapes (21–24) have a border of <code>color</code> and are filled with <code>fill</code>.</figcaption>
|
||||
<figure id="fig-shapes"><p><img src="data-visualize_files/figure-html/fig-shapes-1.png" alt="Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue." width="576"/></p>
|
||||
<figcaption>R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.<code>color</code> and <code>fill</code> aesthetics. The hollow shapes (0–14) have a border determined by <code>color</code>; the solid shapes (15–20) are filled with <code>color</code>; the filled shapes (21–24) have a border of <code>color</code> and are filled with <code>fill</code>.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -499,8 +499,8 @@ Statistical transformations</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/visualization-stat-bar.png" style="width:100.0%" alt="A figure demonstrating three steps of creating a bar chart. Step 1. geom_bar() begins with the diamonds data set. Step 2. geom_bar() transforms the data with the count stat, which returns a data set of cut values and counts. Step 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis."/></p>
|
||||
<figcaption class="figure-caption">Figure 2.2: When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.</figcaption>
|
||||
<figure id="fig-vis-stat-bar"><p><img src="images/visualization-stat-bar.png" style="width:100.0%" alt="A figure demonstrating three steps of creating a bar chart. Step 1. geom_bar() begins with the diamonds data set. Step 2. geom_bar() transforms the data with the count stat, which returns a data set of cut values and counts. Step 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis."/></p>
|
||||
<figcaption>When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -7,8 +7,8 @@ What you will learn</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-rstudio-console"><p><img src="diagrams/data-science/base.png" alt="A diagram displaying the data science cycle: Import -> Tidy -> Understand (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Communicate. " width="535"/></p>
|
||||
<figcaption>Figure 1.1: In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.</figcaption>
|
||||
<figure id="fig-ds-diagram"><p><img src="diagrams/data-science/base.png" alt="A diagram displaying the data science cycle: Import -> Tidy -> Understand (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Communicate. " width="535"/></p>
|
||||
<figcaption>In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -79,8 +79,8 @@ RStudio</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/rstudio/console.png" alt="The RStudio IDE with the panes Console and Output highlighted." width="520"/></p>
|
||||
<figcaption class="figure-caption">Figure 1.2: The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.</figcaption>
|
||||
<figure id="fig-rstudio-console"><p><img src="diagrams/rstudio/console.png" alt="The RStudio IDE with the panes Console and Output highlighted." width="520"/></p>
|
||||
<figcaption>The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -116,8 +116,8 @@ Primary and foreign keys</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-join-closest"><p><img src="diagrams/relational.png" alt="The relationships between airports, planes, flights, weather, and airlines datasets from the nycflights13 package. airports$faa connected to the flights$origin and flights$dest. planes$tailnum is connected to the flights$tailnum. weather$time_hour and weather$origin are jointly connected to flights$time_hour and flights$origin. airlines$carrier is connected to flights$carrier. There are no direct connections between airports, planes, airlines, and weather data frames." width="502"/></p>
|
||||
<figcaption>Figure 19.1: Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.</figcaption>
|
||||
<figure id="fig-flights-relationships"><p><img src="diagrams/relational.png" alt="The relationships between airports, planes, flights, weather, and airlines datasets from the nycflights13 package. airports$faa connected to the flights$origin and flights$dest. planes$tailnum is connected to the flights$tailnum. weather$time_hour and weather$origin are jointly connected to flights$time_hour and flights$origin. airlines$carrier is connected to flights$carrier. There are no direct connections between airports, planes, airlines, and weather data frames." width="502"/></p>
|
||||
<figcaption>Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -500,8 +500,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/setup.png" alt="x and y are two data frames with 2 columns and 3 rows, with contents as described in the text. The values of the keys are coloured: 1 is green, 2 is purple, 3 is orange, and 4 is yellow." width="160"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.2: Graphical representation of two simple tables. The coloured <code>key</code> columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.</figcaption>
|
||||
<figure id="fig-join-setup"><p><img src="diagrams/join/setup.png" alt="x and y are two data frames with 2 columns and 3 rows, with contents as described in the text. The values of the keys are coloured: 1 is green, 2 is purple, 3 is orange, and 4 is yellow." width="160"/></p>
|
||||
<figcaption>Graphical representation of two simple tables. The coloured key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.<code>key</code> columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -509,8 +509,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/setup2.png" alt="x and y are placed at right-angles, with horizonal lines extending from x and vertical lines extending from y. There are 3 rows in x and 3 rows in y, which leads to nine intersections representing nine potential matches." width="170"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.3: To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.</figcaption>
|
||||
<figure id="fig-join-setup2"><p><img src="diagrams/join/setup2.png" alt="x and y are placed at right-angles, with horizonal lines extending from x and vertical lines extending from y. There are 3 rows in x and 3 rows in y, which leads to nine intersections representing nine potential matches." width="170"/></p>
|
||||
<figcaption>To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -518,8 +518,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/inner.png" alt="x and y are placed at right-angles with lines forming a grid of potential matches. Keys 1 and 2 appear in both x and y, so we get a match, indicated by a dot. Each dot corresponds to a row in the output, so the resulting joined data frame has two rows." width="363"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.4: An inner join matches each row in <code>x</code> to the row in <code>y</code> that has the same value of <code>key</code>. Each match becomes a row in the output.</figcaption>
|
||||
<figure id="fig-join-inner"><p><img src="diagrams/join/inner.png" alt="x and y are placed at right-angles with lines forming a grid of potential matches. Keys 1 and 2 appear in both x and y, so we get a match, indicated by a dot. Each dot corresponds to a row in the output, so the resulting joined data frame has two rows." width="363"/></p>
|
||||
<figcaption>An inner join matches each row in x to the row in y that has the same value of key. Each match becomes a row in the output.<code>x</code> to the row in <code>y</code> that has the same value of <code>key</code>. Each match becomes a row in the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -529,8 +529,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/left.png" alt="Compared to the previous diagram showing an inner join, the y table gets a new virtual row containin NA that will match any row in x that didn't otherwise match. This means that the output now has three rows. For key = 3, which matches this virtual row, val_y takes value NA." width="385"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.5: A visual representation of the left join where every row in <code>x</code> appears in the output.</figcaption>
|
||||
<figure id="fig-join-left"><p><img src="diagrams/join/left.png" alt="Compared to the previous diagram showing an inner join, the y table gets a new virtual row containin NA that will match any row in x that didn't otherwise match. This means that the output now has three rows. For key = 3, which matches this virtual row, val_y takes value NA." width="385"/></p>
|
||||
<figcaption>A visual representation of the left join where every row in x appears in the output.<code>x</code> appears in the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -540,8 +540,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/right.png" alt="Compared to the previous diagram showing an left join, the x table now gains a virtual row so that every row in y gets a match in x. val_x contains NA for the row in y that didn't match x." width="380"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.6: A visual representation of the right join where every row of <code>y</code> appears in the output.</figcaption>
|
||||
<figure id="fig-join-right"><p><img src="diagrams/join/right.png" alt="Compared to the previous diagram showing an left join, the x table now gains a virtual row so that every row in y gets a match in x. val_x contains NA for the row in y that didn't match x." width="380"/></p>
|
||||
<figcaption>A visual representation of the right join where every row of y appears in the output.<code>y</code> appears in the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -551,8 +551,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/full.png" alt="Now both x and y have a virtual row that always matches. The result has 4 rows: keys 1, 2, 3, and 4 with all values from val_x and val_y, however key 2, val_y and key 4, val_x are NAs since those keys don't have a match in the other data frames." width="388"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.7: A visual representation of the full join where every row in <code>x</code> and <code>y</code> appears in the output.</figcaption>
|
||||
<figure id="fig-join-full"><p><img src="diagrams/join/full.png" alt="Now both x and y have a virtual row that always matches. The result has 4 rows: keys 1, 2, 3, and 4 with all values from val_x and val_y, however key 2, val_y and key 4, val_x are NAs since those keys don't have a match in the other data frames." width="388"/></p>
|
||||
<figcaption>A visual representation of the full join where every row in x and y appears in the output.<code>x</code> and <code>y</code> appears in the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -561,8 +561,8 @@ y <- tribble(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/venn.png" alt="Venn diagrams for inner, full, left, and right joins. Each join represented with two intersecting circles representing data frames x and y, with x on the right and y on the left. Shading indicates the result of the join." width="385"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.8: Venn diagrams showing the difference between inner, left, right, and full joins.</figcaption>
|
||||
<figure id="fig-join-venn"><p><img src="diagrams/join/venn.png" alt="Venn diagrams for inner, full, left, and right joins. Each join represented with two intersecting circles representing data frames x and y, with x on the right and y on the left. Shading indicates the result of the join." width="385"/></p>
|
||||
<figcaption>Venn diagrams showing the difference between inner, left, right, and full joins.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -574,8 +574,8 @@ Row matching</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/match-types.png" alt="A join diagram where x has key values 1, 2, and 3, and y has key values 1, 2, 2. The output has three rows because key 1 matches one row, key 2 matches two rows, and key 3 matches zero rows." width="348"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.9: The three ways a row in <code>x</code> can match. <code>x1</code> matches one row in <code>y</code>, <code>x2</code> matches two rows in <code>y</code>, <code>x3</code> matches zero rows in y. Note that while there are three rows in <code>x</code> and three rows in the output, there isn’t a direct correspondence between the rows.</figcaption>
|
||||
<figure id="fig-join-match-types"><p><img src="diagrams/join/match-types.png" alt="A join diagram where x has key values 1, 2, and 3, and y has key values 1, 2, 2. The output has three rows because key 1 matches one row, key 2 matches two rows, and key 3 matches zero rows." width="348"/></p>
|
||||
<figcaption>The three ways a row in x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.<code>x</code> can match. <code>x1</code> matches one row in <code>y</code>, <code>x2</code> matches two rows in <code>y</code>, <code>x3</code> matches zero rows in y. Note that while there are three rows in <code>x</code> and three rows in the output, there isn’t a direct correspondence between the rows.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -683,16 +683,16 @@ Filtering joins</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/semi.png" alt="A join diagram with old friends x and y. In a semi join, only the presence of a match matters so the output contains the same columns as x." width="318"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.10: In a semi-join it only matters that there is a match; otherwise values in <code>y</code> don’t affect the output.</figcaption>
|
||||
<figure id="fig-join-semi"><p><img src="diagrams/join/semi.png" alt="A join diagram with old friends x and y. In a semi join, only the presence of a match matters so the output contains the same columns as x." width="318"/></p>
|
||||
<figcaption>In a semi-join it only matters that there is a match; otherwise values in y don’t affect the output.<code>y</code> don’t affect the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/anti.png" alt="An anti-join is the inverse of a semi-join so matches are drawn with red lines indicating that they will be dropped from the output." width="317"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.11: An anti-join is the inverse of a semi-join, dropping rows from <code>x</code> that have a match in <code>y</code>.</figcaption>
|
||||
<figure id="fig-join-anti"><p><img src="diagrams/join/anti.png" alt="An anti-join is the inverse of a semi-join so matches are drawn with red lines indicating that they will be dropped from the output." width="317"/></p>
|
||||
<figcaption>An anti-join is the inverse of a semi-join, dropping rows from x that have a match in y.<code>x</code> that have a match in <code>y</code>.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -716,8 +716,8 @@ Non-equi joins</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/inner-both.png" alt="A join diagram showing an inner join betwen x and y. The result now includes four columns: key.x, val_x, key.y, and val_y. The values of key.x and key.y are identical, which is why we usually only show one. " width="415"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.12: An left join showing both <code>x</code> and <code>y</code> keys in the output.</figcaption>
|
||||
<figure id="fig-inner-both"><p><img src="diagrams/join/inner-both.png" alt="A join diagram showing an inner join betwen x and y. The result now includes four columns: key.x, val_x, key.y, and val_y. The values of key.x and key.y are identical, which is why we usually only show one. " width="415"/></p>
|
||||
<figcaption>An left join showing both x and y keys in the output.<code>x</code> and <code>y</code> keys in the output.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -725,8 +725,8 @@ Non-equi joins</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/gte.png" alt="A join diagram illustrating join_by(key >= key). The first row of x matches one row of y and the second and thirds rows each match two rows. This means the output has five rows containing each of the following (key.x, key.y) pairs: (1, 1), (2, 1), (2, 2), (3, 1), (3, 2)." width="385"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.13: A non-equi join where the <code>x</code> key must greater than or equal to than the <code>y</code> key. Many rows generate multiple matches.</figcaption>
|
||||
<figure id="fig-join-gte"><p><img src="diagrams/join/gte.png" alt="A join diagram illustrating join_by(key >= key). The first row of x matches one row of y and the second and thirds rows each match two rows. This means the output has five rows containing each of the following (key.x, key.y) pairs: (1, 1), (2, 1), (2, 2), (3, 1), (3, 2)." width="385"/></p>
|
||||
<figcaption>A non-equi join where the x key must greater than or equal to than the y key. Many rows generate multiple matches.<code>x</code> key must greater than or equal to than the <code>y</code> key. Many rows generate multiple matches.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -748,8 +748,8 @@ Cross joins</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/cross.png" alt="A join diagram showing a dot for every combination of x and y." width="155"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.14: A cross join matches each row in <code>x</code> with every row in <code>y</code>.</figcaption>
|
||||
<figure id="fig-join-cross"><p><img src="diagrams/join/cross.png" alt="A join diagram showing a dot for every combination of x and y." width="155"/></p>
|
||||
<figcaption>A cross join matches each row in x with every row in y.<code>x</code> with every row in <code>y</code>.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -777,8 +777,8 @@ Inequality joins</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/lt.png" width="185"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.15: An inequality join where <code>x</code> is joined to <code>y</code> on rows where the key of <code>x</code> is less than the key of <code>y</code>. This makes a triangular shape in the top-left corner.</figcaption>
|
||||
<figure id="fig-join-lt"><p><img src="diagrams/join/lt.png" width="185"/></p>
|
||||
<figcaption>An inequality join where x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.<code>x</code> is joined to <code>y</code> on rows where the key of <code>x</code> is less than the key of <code>y</code>. This makes a triangular shape in the top-left corner.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -807,8 +807,8 @@ Rolling joins</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/join/closest.png" alt="A rolling join is a subset of an inequality join so some matches are grayed out indicating that they're not used because they're not the "closest"." width="262"/></p>
|
||||
<figcaption class="figure-caption">Figure 19.16: A following join is similar to a greater-than-or-equal inequality join but only matches the first value.</figcaption>
|
||||
<figure id="fig-join-closest"><p><img src="diagrams/join/closest.png" alt="A rolling join is a subset of an inequality join so some matches are grayed out indicating that they're not used because they're not the "closest"." width="262"/></p>
|
||||
<figcaption>A following join is similar to a greater-than-or-equal inequality join but only matches the first value.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -252,8 +252,8 @@ Boolean algebra</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-prop-delayed-dist"><p><img src="diagrams/transform.png" alt="Six Venn diagrams, each explaining a given logical operator. The circles (sets) in each of the Venn diagrams represent x and y. 1. y & !x is y but none of x; x & y is the intersection of x and y; x & !y is x but none of y; x is all of x none of y; xor(x, y) is everything except the intersection of x and y; y is all of y and none of x; and x | y is everything." width="395"/></p>
|
||||
<figcaption>Figure 12.1: The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.<code>x</code> is the left-hand circle, <code>y</code> is the right-hand circle, and the shaded region show which parts each operator selects.</figcaption>
|
||||
<figure id="fig-bool-ops"><p><img src="diagrams/transform.png" alt="Six Venn diagrams, each explaining a given logical operator. The circles (sets) in each of the Venn diagrams represent x and y. 1. y & !x is y but none of x; x & y is the intersection of x and y; x & !y is x but none of y; x is all of x none of y; xor(x, y) is everything except the intersection of x and y; y is all of y and none of x; and x | y is everything." width="395"/></p>
|
||||
<figcaption>The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.<code>x</code> is the left-hand circle, <code>y</code> is the right-hand circle, and the shaded region show which parts each operator selects.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -427,8 +427,8 @@ Numeric summaries of logical vectors</h2>
|
|||
geom_histogram(binwidth = 0.05)</pre>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="logicals_files/figure-html/fig-prop-delayed-dist-1.png" alt="The distribution is unimodal and mildly right skewed. The distribution peaks around 30% delayed flights." width="576"/></p>
|
||||
<figcaption class="figure-caption">Figure 12.2: A histogram showing the proportion of delayed flights each day.</figcaption>
|
||||
<figure id="fig-prop-delayed-dist"><p><img src="logicals_files/figure-html/fig-prop-delayed-dist-1.png" alt="The distribution is unimodal and mildly right skewed. The distribution peaks around 30% delayed flights." width="576"/></p>
|
||||
<figcaption>A histogram showing the proportion of delayed flights each day.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -315,8 +315,8 @@ Modular arithmetic</h2>
|
|||
geom_point(aes(size = n))</pre>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-flights-dist-daily"><p><img src="numbers_files/figure-html/fig-prop-cancelled-1.png" alt="A line plot showing how proportion of cancelled flights changes over the course of the day. The proportion starts low at around 0.5% at 6am, then steadily increases over the course of the day until peaking at 4% at 7pm. The proportion of cancelled flights then drops rapidly getting down to around 1% by midnight." width="576"/></p>
|
||||
<figcaption>Figure 13.1: A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.</figcaption>
|
||||
<figure id="fig-prop-cancelled"><p><img src="numbers_files/figure-html/fig-prop-cancelled-1.png" alt="A line plot showing how proportion of cancelled flights changes over the course of the day. The proportion starts low at around 0.5% at 6am, then steadily increases over the course of the day until peaking at 4% at 7pm. The proportion of cancelled flights then drops rapidly getting down to around 1% by midnight." width="576"/></p>
|
||||
<figcaption>A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -641,8 +641,8 @@ Center</h2>
|
|||
#> ℹ Please use `linewidth` instead.</pre>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="numbers_files/figure-html/fig-mean-vs-median-1.png" alt="All points fall below a 45° line, meaning that the median delay is always less than the mean delay. Most points are clustered in a dense region of mean [0, 20] and median [0, 5]. As the mean delay increases, the spread of the median also increases. There are two outlying points with mean ~60, median ~50, and mean ~85, median ~55." width="576"/></p>
|
||||
<figcaption class="figure-caption">Figure 13.2: A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.</figcaption>
|
||||
<figure id="fig-mean-vs-median"><p><img src="numbers_files/figure-html/fig-mean-vs-median-1.png" alt="All points fall below a 45° line, meaning that the median delay is always less than the mean delay. Most points are clustered in a dense region of mean [0, 20] and median [0, 5]. As the mean delay increases, the spread of the median also increases. There are two outlying points with mean ~60, median ~50, and mean ~85, median ~55." width="576"/></p>
|
||||
<figcaption>A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -716,18 +716,18 @@ flights |>
|
|||
<figure class="figure"><div class="quarto-layout-row quarto-layout-valign-top">
|
||||
<div class="cell-output-display quarto-layout-cell quarto-layout-cell-subref" style="flex-basis: 50.0%;justify-content: center;">
|
||||
|
||||
<figure class="figure"><p><img src="numbers_files/figure-html/fig-flights-dist-1.png" alt="Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that. " data-ref-parent="fig-flights-dist" width="384"/></p>
|
||||
<figcaption class="figure-caption">(a) Histogram shows the full range of delays.</figcaption>
|
||||
<figure id="fig-flights-dist-1"><p><img src="numbers_files/figure-html/fig-flights-dist-1.png" alt="Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that. " data-ref-parent="fig-flights-dist" width="384"/></p>
|
||||
<figcaption>(a) Histogram shows the full range of delays.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
<div class="cell-output-display quarto-layout-cell quarto-layout-cell-subref" style="flex-basis: 50.0%;justify-content: center;">
|
||||
|
||||
<figure class="figure"><p><img src="numbers_files/figure-html/fig-flights-dist-2.png" alt="Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that. " data-ref-parent="fig-flights-dist" width="384"/></p>
|
||||
<figcaption class="figure-caption">(b) Histogram is zoomed in to show delays less than 2 hours.</figcaption>
|
||||
<figure id="fig-flights-dist-2"><p><img src="numbers_files/figure-html/fig-flights-dist-2.png" alt="Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that. " data-ref-parent="fig-flights-dist" width="384"/></p>
|
||||
<figcaption>(b) Histogram is zoomed in to show delays less than 2 hours.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<figcaption class="figure-caption">Figure 13.3: The distribution of <code>dep_delay</code> appears highly skewed to the right in both histograms.</figcaption>
|
||||
<p/><figcaption class="figure-caption">Figure 13.3: The distribution of <code>dep_delay</code> appears highly skewed to the right in both histograms.</figcaption><p/>
|
||||
</figure></div>
|
||||
</div>
|
||||
<p>It’s also a good idea to check that distributions for subgroups resemble the whole. <a href="#fig-flights-dist-daily" data-type="xref">#fig-flights-dist-daily</a> overlays a frequency polygon for each day. The distributions seem to follow a common pattern, suggesting it’s fine to use the same summary for each day.</p>
|
||||
|
@ -738,8 +738,8 @@ flights |>
|
|||
geom_freqpoly(binwidth = 5, alpha = 1/5)</pre>
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="numbers_files/figure-html/fig-flights-dist-daily-1.png" alt="The distribution of `dep_delay` is highly right skewed with a strong peak slightly less than 0. The 365 frequency polygons are mostly overlapping forming a thick black bland." width="576"/></p>
|
||||
<figcaption class="figure-caption">Figure 13.4: 365 frequency polygons of <code>dep_delay</code>, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.</figcaption>
|
||||
<figure id="fig-flights-dist-daily"><p><img src="numbers_files/figure-html/fig-flights-dist-daily-1.png" alt="The distribution of `dep_delay` is highly right skewed with a strong peak slightly less than 0. The 365 frequency polygons are mostly overlapping forming a thick black bland." width="576"/></p>
|
||||
<figcaption>365 frequency polygons of dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.<code>dep_delay</code>, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -133,24 +133,24 @@ str(x5)
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-view-expand-2"><p><img src="screenshots/View-1.png" alt="A screenshot of RStudio showing the list-viewer. It shows the two children of x4: the first child is a double vector and the second child is a list. A rightward facing triable indicates that the second child itself has children but you can't see them. " width="689"/></p>
|
||||
<figcaption>Figure 22.1: The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.</figcaption>
|
||||
<figure id="fig-view-collapsed"><p><img src="screenshots/View-1.png" alt="A screenshot of RStudio showing the list-viewer. It shows the two children of x4: the first child is a double vector and the second child is a list. A rightward facing triable indicates that the second child itself has children but you can't see them. " width="689"/></p>
|
||||
<figcaption>The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="screenshots/View-2.png" alt="Another screenshot of the list-viewer having expand the second child of x2. It also has two children, a double vector and another list. " width="689"/></p>
|
||||
<figcaption class="figure-caption">Figure 22.2: Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.</figcaption>
|
||||
<figure id="fig-view-expand-1"><p><img src="screenshots/View-2.png" alt="Another screenshot of the list-viewer having expand the second child of x2. It also has two children, a double vector and another list. " width="689"/></p>
|
||||
<figcaption>Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="screenshots/View-3.png" alt="Another screenshot, having expanded the grandchild of x4 to see its two children, again a double vector and a list. " width="689"/></p>
|
||||
<figcaption class="figure-caption">Figure 22.3: You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case <code>x4[[2]][[2]][[2]]</code>.</figcaption>
|
||||
<figure id="fig-view-expand-2"><p><img src="screenshots/View-3.png" alt="Another screenshot, having expanded the grandchild of x4 to see its two children, again a double vector and a list. " width="689"/></p>
|
||||
<figcaption>You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case x4[[2]][[2]][[2]].<code>x4[[2]][[2]][[2]]</code>.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -198,7 +198,7 @@ Detect matches</h2>
|
|||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-x-names"><p><img src="regexps_files/figure-html/fig-x-names-1.png" alt="A timeseries showing the proportion of baby names that contain the letter x. The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in 1980, then increases rapidly to 16 per 1000 in 2019." width="576"/></p>
|
||||
<figcaption>Figure 15.1: A time series showing the proportion of baby names that contain a lower case “x”.</figcaption>
|
||||
<figcaption>A time series showing the proportion of baby names that contain a lower case “x”.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -50,8 +50,8 @@ Reading spreadsheets</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-penguins-species"><p><img src="images/import-spreadsheets-students.png" alt="A look at the students spreadsheet in Excel. The spreadsheet contains information on 6 students, their ID, full name, favourite food, meal plan, and age." width="1200"/></p>
|
||||
<figcaption>Figure 20.1: Spreadsheet called students.xlsx in Excel.</figcaption>
|
||||
<figure id="fig-students-excel"><p><img src="images/import-spreadsheets-students.png" alt="A look at the students spreadsheet in Excel. The spreadsheet contains information on 6 students, their ID, full name, favourite food, meal plan, and age." width="1200"/></p>
|
||||
<figcaption>Spreadsheet called students.xlsx in Excel.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -188,8 +188,8 @@ Reading individual sheets</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/import-spreadsheets-penguins-islands.png" alt="A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island." width="1514"/></p>
|
||||
<figcaption class="figure-caption">Figure 20.2: Spreadsheet called penguins.xlsx in Excel.</figcaption>
|
||||
<figure id="fig-penguins-islands"><p><img src="images/import-spreadsheets-penguins-islands.png" alt="A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island." width="1514"/></p>
|
||||
<figcaption>Spreadsheet called penguins.xlsx in Excel.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -270,8 +270,8 @@ Reading part of a sheet</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/import-spreadsheets-deaths.png" alt="A look at the deaths spreadsheet in Excel. The spreadsheet has four rows on top that contain non-data information; the text 'For the same of consistency in the data layout, which is really a beautiful thing, I will keep making notes up here.' is spread across cells in these top four rows. Then, there is a data frame that includes information on deaths of 10 famous people, including their names, professions, ages, whether they have kids or not, date of birth and death. At the bottom, there are four more rows of non-data information; the text 'This has been really fun, but we're signing off now!' is spread across cells in these bottom four rows." width="1614"/></p>
|
||||
<figcaption class="figure-caption">Figure 20.3: Spreadsheet called deaths.xlsx in Excel.</figcaption>
|
||||
<figure id="fig-deaths-excel"><p><img src="images/import-spreadsheets-deaths.png" alt="A look at the deaths spreadsheet in Excel. The spreadsheet has four rows on top that contain non-data information; the text 'For the same of consistency in the data layout, which is really a beautiful thing, I will keep making notes up here.' is spread across cells in these top four rows. Then, there is a data frame that includes information on deaths of 10 famous people, including their names, professions, ages, whether they have kids or not, date of birth and death. At the bottom, there are four more rows of non-data information; the text 'This has been really fun, but we're signing off now!' is spread across cells in these bottom four rows." width="1614"/></p>
|
||||
<figcaption>Spreadsheet called deaths.xlsx in Excel.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -398,8 +398,8 @@ write_xlsx(bake_sale, path = "data/bake-sale.xlsx")</pre>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/import-spreadsheets-bake-sale.png" alt="Bake sale data frame created earlier in Excel." width="917"/></p>
|
||||
<figcaption class="figure-caption">Figure 20.4: Spreadsheet called bake_sale.xlsx in Excel.</figcaption>
|
||||
<figure id="fig-bake-sale-excel"><p><img src="images/import-spreadsheets-bake-sale.png" alt="Bake sale data frame created earlier in Excel." width="917"/></p>
|
||||
<figcaption>Spreadsheet called bake_sale.xlsx in Excel.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -477,8 +477,8 @@ writeDataTable(
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="images/import-spreadsheets-penguins-species.png" alt="A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island." width="1106"/></p>
|
||||
<figcaption class="figure-caption">Figure 20.5: Spreadsheet called penguins.xlsx in Excel.</figcaption>
|
||||
<figure id="fig-penguins-species"><p><img src="images/import-spreadsheets-penguins-species.png" alt="A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island." width="1106"/></p>
|
||||
<figcaption>Spreadsheet called penguins.xlsx in Excel.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-pipe-options"><p><img src="screenshots/rstudio-pipe-options.png" alt="Screenshot showing the "Use native pipe operator" option which can be found on the "Editing" panel of the "Code" options." width="616"/></p>
|
||||
<figcaption>Figure 5.1: To insert |>, make sure the “Use native pipe operator” option is checked.<code>|></code>, make sure the “Use native pipe operator” option is checked.</figcaption>
|
||||
<figcaption>To insert |>, make sure the “Use native pipe operator” option is checked.<code>|></code>, make sure the “Use native pipe operator” option is checked.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -17,8 +17,8 @@ Scripts</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-new-project-3"><p><img src="diagrams/rstudio/script.png" alt="RStudio IDE with Editor, Console, and Output highlighted." width="521"/></p>
|
||||
<figcaption>Figure 9.1: Opening the script editor adds a new pane at the top-left of the IDE.</figcaption>
|
||||
<figure id="fig-rstudio-script"><p><img src="diagrams/rstudio/script.png" alt="RStudio IDE with Editor, Console, and Output highlighted." width="521"/></p>
|
||||
<figcaption>Copy these options in your RStudio options to always start your RStudio session with a clean slate.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -115,8 +115,8 @@ What is the source of truth?</h2>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="diagrams/rstudio/clean-slate.png" alt="RStudio preferences window where the option Restore .RData into workspace at startup is not checked. Also, the option Save workspace to .RData on exit is set to Never. " width="523"/></p>
|
||||
<figcaption class="figure-caption">Figure 9.2: Copy these options in your RStudio options to always start your RStudio session with a clean slate.</figcaption>
|
||||
<figure id="fig-blank-slate"><p><img src="diagrams/rstudio/clean-slate.png" alt="RStudio preferences window where the option Restore .RData into workspace at startup is not checked. Also, the option Save workspace to .RData on exit is set to Never. " width="523"/></p>
|
||||
<figcaption>(a) First click New Directory.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -159,22 +159,22 @@ Where does your analysis live?</h2>
|
|||
RStudio projects</h2>
|
||||
<p>Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via <strong>projects</strong>. Let’s make a project for you to use while you’re working through the rest of this book. Click File > New Project, then follow the steps shown in <a href="#fig-new-project" data-type="xref">#fig-new-project</a>.</p>
|
||||
|
||||
<figure class="figure"><div class="cell-output-display">
|
||||
<figure id="fig-new-project"><div class="cell-output-display">
|
||||
<div class="quarto-figure quarto-figure-center anchored">
|
||||
<figure class="figure"><p><img src="screenshots/rstudio-project-1.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="542"/></p>
|
||||
<figcaption class="figure-caption">(a) First click New Directory.</figcaption>
|
||||
<figure id="fig-new-project-1"><p><img src="screenshots/rstudio-project-1.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="542"/></p>
|
||||
<figcaption>(a) First click New Directory.</figcaption>
|
||||
</figure></div>
|
||||
</div>
|
||||
<div class="cell-output-display">
|
||||
<div class="quarto-figure quarto-figure-center anchored">
|
||||
<figure class="figure"><p><img src="screenshots/rstudio-project-2.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="545"/></p>
|
||||
<figcaption class="figure-caption">(b) Then click New Project.</figcaption>
|
||||
<figure id="fig-new-project-2"><p><img src="screenshots/rstudio-project-2.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="545"/></p>
|
||||
<figcaption>(b) Then click New Project.</figcaption>
|
||||
</figure></div>
|
||||
</div>
|
||||
<div class="cell-output-display">
|
||||
<div class="quarto-figure quarto-figure-center anchored">
|
||||
<figure class="figure"><p><img src="screenshots/rstudio-project-3.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="548"/></p>
|
||||
<figcaption class="figure-caption">(c) Finally, fill in the directory (project) name, choose a good subdirectory for its home and click Create Project.</figcaption>
|
||||
<figure id="fig-new-project-3"><p><img src="screenshots/rstudio-project-3.png" alt="Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop. " data-ref-parent="fig-new-project" width="548"/></p>
|
||||
<figcaption>Opening the script editor adds a new pane at the top-left of the IDE.</figcaption>
|
||||
</figure></div>
|
||||
</div>
|
||||
<figcaption class="figure-caption">Figure 9.3: Create a new project by following these three steps.</figcaption>
|
||||
|
|
|
@ -10,8 +10,8 @@
|
|||
<p>Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. Even as a very new programmer it’s a good idea to work on your code style. Using a consistent style makes it easier for others (including future-you!) to read your work, and is particularly important if you need to get help from someone else. This chapter will introduce to the most important points of the <a href="https://style.tidyverse.org">tidyverse style guide</a>, which is used throughout this book.</p><p>Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature. Additionally, there are some great tools to quickly restyle existing code, like the <a href="https://styler.r-lib.org">styler</a> package by Lorenz Walthert. Once you’ve installed it with <code>install.packages("styler")</code>, an easy way to use it is via RStudio’s <strong>command palette</strong>. The command palette lets you use any build-in RStudio command, as well as many addins provided by packages. Open the palette by pressing Cmd/Ctrl + Shift + P, then type “styler” to see all the shortcuts provided by styler. <a href="#fig-styler" data-type="xref">#fig-styler</a> shows the results.</p><div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure id="fig-rstudio-sections"><p><img src="screenshots/rstudio-palette.png" alt="A screenshot showing the command palette after typing "styler", showing the four styling tool provided by the package." width="638"/></p>
|
||||
<figcaption>Figure 7.1: RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.</figcaption>
|
||||
<figure id="fig-styler"><p><img src="screenshots/rstudio-palette.png" alt="A screenshot showing the command palette after typing "styler", showing the four styling tool provided by the package." width="638"/></p>
|
||||
<figcaption>RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div><div class="cell">
|
||||
|
@ -180,8 +180,8 @@ Sectioning comments</h1>
|
|||
<div class="cell">
|
||||
<div class="cell-output-display">
|
||||
|
||||
<figure class="figure"><p><img src="screenshots/rstudio-nav.png" width="125"/></p>
|
||||
<figcaption class="figure-caption">Figure 7.2: After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.</figcaption>
|
||||
<figure id="fig-rstudio-sections"><p><img src="screenshots/rstudio-nav.png" width="125"/></p>
|
||||
<figcaption>After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.</figcaption>
|
||||
</figure>
|
||||
</div>
|
||||
</div>
|
||||
|
|
|
@ -22,4 +22,3 @@ Welcome to the second edition of "R for Data Science".
|
|||
## Acknowledgements {.unnumbered}
|
||||
|
||||
*TO DO: Add acknowledgements.*
|
||||
|
||||
|
|
Loading…
Reference in New Issue