Don't transform non-crossref links
This commit is contained in:
@@ -15,7 +15,7 @@ diamonds_db <- tbl(con, in_catalog("north_america", "sales", "diamonds"))</pr
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db <- tbl(con, sql("SELECT * FROM diamonds"))</pre>
|
||||
</div>
|
||||
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="#chp-https://github.com/tidyverse/dbplyr/issues/" data-type="xref">#chp-https://github.com/tidyverse/dbplyr/issues/</a> to help us do better.</p>
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="https://github.com/tidyverse/dbplyr/issues/">on GitHub</a> to help us do better.</p>
|
||||
|
||||
<p>In the examples above note that <code>"year"</code> and <code>"type"</code> are wrapped in double quotes. That’s because these are <strong>reserved words</strong> in duckdb, so dbplyr quotes them to avoid any potential confusion between column/table names and SQL operators.</p><p>When working with other databases you’re likely to see every variable name quotes because only a handful of client packages, like duckdb, know what all the reserved words are, so they quote everything to be safe.</p><pre data-type="programlisting" data-code-language="sql">SELECT "tailnum", "type", "manufacturer", "model", "year"
|
||||
FROM "planes"</pre><p>Some other database systems use backticks instead of quotes:</p><pre data-type="programlisting" data-code-language="sql">SELECT `tailnum`, `type`, `manufacturer`, `model`, `year`
|
||||
@@ -62,7 +62,7 @@ Connecting to a database</h1>
|
||||
<ul><li><p>You’ll always use DBI (<strong>d</strong>ata<strong>b</strong>ase <strong>i</strong>nterface) because it provides a set of generic functions that connect to the database, upload data, run SQL queries, etc.</p></li>
|
||||
<li><p>You’ll also use a package tailored for the DBMS you’re connecting to. This package translates the generic DBI commands into the specifics needed for a given DBMS. There’s usually one package for each DMBS, e.g. RPostgres for Postgres and RMariaDB for MySQL.</p></li>
|
||||
</ul><p>If you can’t find a specific package for your DBMS, you can usually use the odbc package instead. This uses the ODBC protocol supported by many DBMS. odbc requires a little more setup because you’ll also need to install an ODBC driver and tell the odbc package where to find it.</p>
|
||||
<p>Concretely, you create a database connection using <code><a href="#chp-https://dbi.r-dbi.org/reference/dbConnect" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbConnect</a></code>. The first argument selects the DBMS<span data-type="footnote">Typically, this is the only function you’ll use from the client package, so we recommend using <code>::</code> to pull out that one function, rather than loading the complete package with <code><a href="#chp-https://rdrr.io/r/base/library" data-type="xref">#chp-https://rdrr.io/r/base/library</a></code>.</span>, then the second and subsequent arguments describe how to connect to it (i.e. where it lives and the credentials that you need to access it). The following code shows a couple of typical examples:</p>
|
||||
<p>Concretely, you create a database connection using <code><a href="https://dbi.r-dbi.org/reference/dbConnect.html">DBI::dbConnect()</a></code>. The first argument selects the DBMS<span data-type="footnote">Typically, this is the only function you’ll use from the client package, so we recommend using <code>::</code> to pull out that one function, rather than loading the complete package with <code><a href="https://rdrr.io/r/base/library.html">library()</a></code>.</span>, then the second and subsequent arguments describe how to connect to it (i.e. where it lives and the credentials that you need to access it). The following code shows a couple of typical examples:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">con <- DBI::dbConnect(
|
||||
RMariaDB::MariaDB(),
|
||||
@@ -93,7 +93,7 @@ In this book</h2>
|
||||
<section id="sec-load-data" data-type="sect2">
|
||||
<h2>
|
||||
Load some data</h2>
|
||||
<p>Since this is a new database, we need to start by adding some data. Here we’ll add <code>mpg</code> and <code>diamonds</code> datasets from ggplot2 using <code><a href="#chp-https://dbi.r-dbi.org/reference/dbWriteTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbWriteTable</a></code>. The simplest usage of <code><a href="#chp-https://dbi.r-dbi.org/reference/dbWriteTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbWriteTable</a></code> needs three arguments: a database connection, the name of the table to create in the database, and a data frame of data.</p>
|
||||
<p>Since this is a new database, we need to start by adding some data. Here we’ll add <code>mpg</code> and <code>diamonds</code> datasets from ggplot2 using <code><a href="https://dbi.r-dbi.org/reference/dbWriteTable.html">DBI::dbWriteTable()</a></code>. The simplest usage of <code><a href="https://dbi.r-dbi.org/reference/dbWriteTable.html">dbWriteTable()</a></code> needs three arguments: a database connection, the name of the table to create in the database, and a data frame of data.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">dbWriteTable(con, "mpg", ggplot2::mpg)
|
||||
dbWriteTable(con, "diamonds", ggplot2::diamonds)</pre>
|
||||
@@ -123,7 +123,7 @@ dbExistsTable(con, "foo")
|
||||
<section id="extract-some-data" data-type="sect2">
|
||||
<h2>
|
||||
Extract some data</h2>
|
||||
<p>Once you’ve determined a table exists, you can retrieve it with <code><a href="#chp-https://dbi.r-dbi.org/reference/dbReadTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbReadTable</a></code>:</p>
|
||||
<p>Once you’ve determined a table exists, you can retrieve it with <code><a href="https://dbi.r-dbi.org/reference/dbReadTable.html">dbReadTable()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">con |>
|
||||
dbReadTable("diamonds") |>
|
||||
@@ -139,14 +139,14 @@ Extract some data</h2>
|
||||
#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
|
||||
#> # … with 53,934 more rows</pre>
|
||||
</div>
|
||||
<p><code><a href="#chp-https://dbi.r-dbi.org/reference/dbReadTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbReadTable</a></code> returns a <code>data.frame</code> so we use <code><a href="#chp-https://tibble.tidyverse.org/reference/as_tibble" data-type="xref">#chp-https://tibble.tidyverse.org/reference/as_tibble</a></code> to convert it into a tibble so that it prints nicely.</p>
|
||||
<p>In real life, it’s rare that you’ll use <code><a href="#chp-https://dbi.r-dbi.org/reference/dbReadTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbReadTable</a></code> because often database tables are too big to fit in memory, and you want bring back only a subset of the rows and columns.</p>
|
||||
<p><code><a href="https://dbi.r-dbi.org/reference/dbReadTable.html">dbReadTable()</a></code> returns a <code>data.frame</code> so we use <code><a href="https://tibble.tidyverse.org/reference/as_tibble.html">as_tibble()</a></code> to convert it into a tibble so that it prints nicely.</p>
|
||||
<p>In real life, it’s rare that you’ll use <code><a href="https://dbi.r-dbi.org/reference/dbReadTable.html">dbReadTable()</a></code> because often database tables are too big to fit in memory, and you want bring back only a subset of the rows and columns.</p>
|
||||
</section>
|
||||
|
||||
<section id="sec-dbGetQuery" data-type="sect2">
|
||||
<h2>
|
||||
Run a query</h2>
|
||||
<p>The way you’ll usually retrieve data is with <code><a href="#chp-https://dbi.r-dbi.org/reference/dbGetQuery" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbGetQuery</a></code>. It takes a database connection and some SQL code and returns a data frame:</p>
|
||||
<p>The way you’ll usually retrieve data is with <code><a href="https://dbi.r-dbi.org/reference/dbGetQuery.html">dbGetQuery()</a></code>. It takes a database connection and some SQL code and returns a data frame:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">sql <- "
|
||||
SELECT carat, cut, clarity, color, price
|
||||
@@ -166,21 +166,21 @@ as_tibble(dbGetQuery(con, sql))
|
||||
#> # … with 1,649 more rows</pre>
|
||||
</div>
|
||||
<p>Don’t worry if you’ve never seen SQL before; you’ll learn more about it shortly. But if you read it carefully, you might guess that it selects five columns of the diamonds dataset and all the rows where <code>price</code> is greater than 15,000.</p>
|
||||
<p>You’ll need to be a little careful with <code><a href="#chp-https://dbi.r-dbi.org/reference/dbGetQuery" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbGetQuery</a></code> since it can potentially return more data than you have memory. We won’t discuss it further here, but if you’re dealing with very large datasets it’s possible to deal with a “page” of data at a time by using <code><a href="#chp-https://dbi.r-dbi.org/reference/dbSendQuery" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbSendQuery</a></code> to get a “result set” which you can page through by calling <code><a href="#chp-https://dbi.r-dbi.org/reference/dbFetch" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbFetch</a></code> until <code><a href="#chp-https://dbi.r-dbi.org/reference/dbHasCompleted" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbHasCompleted</a></code> returns <code>TRUE</code>.</p>
|
||||
<p>You’ll need to be a little careful with <code><a href="https://dbi.r-dbi.org/reference/dbGetQuery.html">dbGetQuery()</a></code> since it can potentially return more data than you have memory. We won’t discuss it further here, but if you’re dealing with very large datasets it’s possible to deal with a “page” of data at a time by using <code><a href="https://dbi.r-dbi.org/reference/dbSendQuery.html">dbSendQuery()</a></code> to get a “result set” which you can page through by calling <code><a href="https://dbi.r-dbi.org/reference/dbFetch.html">dbFetch()</a></code> until <code><a href="https://dbi.r-dbi.org/reference/dbHasCompleted.html">dbHasCompleted()</a></code> returns <code>TRUE</code>.</p>
|
||||
</section>
|
||||
|
||||
<section id="other-functions" data-type="sect2">
|
||||
<h2>
|
||||
Other functions</h2>
|
||||
<p>There are lots of other functions in DBI that you might find useful if you’re managing your own data (like <code><a href="#chp-https://dbi.r-dbi.org/reference/dbWriteTable" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbWriteTable</a></code> which we used in <a href="#sec-load-data" data-type="xref">#sec-load-data</a>), but we’re going to skip past them in the interest of staying focused on working with data that already lives in a database.</p>
|
||||
<p>There are lots of other functions in DBI that you might find useful if you’re managing your own data (like <code><a href="https://dbi.r-dbi.org/reference/dbWriteTable.html">dbWriteTable()</a></code> which we used in <a href="#sec-load-data" data-type="xref">#sec-load-data</a>), but we’re going to skip past them in the interest of staying focused on working with data that already lives in a database.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
<section id="dbplyr-basics" data-type="sect1">
|
||||
<h1>
|
||||
dbplyr basics</h1>
|
||||
<p>Now that you’ve learned the low-level basics for connecting to a database and running a query, we’re going to switch it up a bit and learn a bit about dbplyr. dbplyr is a dplyr <strong>backend</strong>, which means that you keep writing dplyr code but the backend executes it differently. In this, dbplyr translates to SQL; other backends include <a href="#chp-https://dtplyr.tidyverse" data-type="xref">#chp-https://dtplyr.tidyverse</a> which translates to <a href="#chp-https://r-datatable" data-type="xref">#chp-https://r-datatable</a>, and <a href="#chp-https://multidplyr.tidyverse" data-type="xref">#chp-https://multidplyr.tidyverse</a> which executes your code on multiple cores.</p>
|
||||
<p>To use dbplyr, you must first use <code><a href="#chp-https://dplyr.tidyverse.org/reference/tbl" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/tbl</a></code> to create an object that represents a database table:</p>
|
||||
<p>Now that you’ve learned the low-level basics for connecting to a database and running a query, we’re going to switch it up a bit and learn a bit about dbplyr. dbplyr is a dplyr <strong>backend</strong>, which means that you keep writing dplyr code but the backend executes it differently. In this, dbplyr translates to SQL; other backends include <a href="https://dtplyr.tidyverse.org">dtplyr</a> which translates to <a href="https://r-datatable.com">data.table</a>, and <a href="https://multidplyr.tidyverse.org">multidplyr</a> which executes your code on multiple cores.</p>
|
||||
<p>To use dbplyr, you must first use <code><a href="https://dplyr.tidyverse.org/reference/tbl.html">tbl()</a></code> to create an object that represents a database table:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db <- tbl(con, "diamonds")
|
||||
diamonds_db
|
||||
@@ -212,7 +212,7 @@ diamonds_db <- tbl(con, in_catalog("north_america", "sales", "diamonds"))</pr
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db <- tbl(con, sql("SELECT * FROM diamonds"))</pre>
|
||||
</div>
|
||||
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="#chp-https://github.com/tidyverse/dbplyr/issues/" data-type="xref">#chp-https://github.com/tidyverse/dbplyr/issues/</a> to help us do better.</p>
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="https://github.com/tidyverse/dbplyr/issues/">on GitHub</a> to help us do better.</p>
|
||||
|
||||
<p>In the examples above note that <code>"year"</code> and <code>"type"</code> are wrapped in double quotes. That’s because these are <strong>reserved words</strong> in duckdb, so dbplyr quotes them to avoid any potential confusion between column/table names and SQL operators.</p><p>When working with other databases you’re likely to see every variable name quotes because only a handful of client packages, like duckdb, know what all the reserved words are, so they quote everything to be safe.</p><pre data-type="programlisting" data-code-language="sql">SELECT "tailnum", "type", "manufacturer", "model", "year"
|
||||
FROM "planes"</pre><p>Some other database systems use backticks instead of quotes:</p><pre data-type="programlisting" data-code-language="sql">SELECT `tailnum`, `type`, `manufacturer`, `model`, `year`
|
||||
@@ -238,7 +238,7 @@ big_diamonds_db
|
||||
#> # … with more rows</pre>
|
||||
</div>
|
||||
<p>You can tell this object represents a database query because it prints the DBMS name at the top, and while it tells you the number of columns, it typically doesn’t know the number of rows. This is because finding the total number of rows usually requires executing the complete query, something we’re trying to avoid.</p>
|
||||
<p>You can see the SQL code generated by the dbplyr function <code><a href="#chp-https://dplyr.tidyverse.org/reference/explain" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/explain</a></code>:</p>
|
||||
<p>You can see the SQL code generated by the dbplyr function <code><a href="https://dplyr.tidyverse.org/reference/explain.html">show_query()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">big_diamonds_db |>
|
||||
show_query()
|
||||
@@ -247,7 +247,7 @@ big_diamonds_db
|
||||
#> FROM diamonds
|
||||
#> WHERE (price > 15000.0)</pre>
|
||||
</div>
|
||||
<p>To get all the data back into R, you call <code><a href="#chp-https://dplyr.tidyverse.org/reference/compute" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/compute</a></code>. Behind the scenes, this generates the SQL, calls <code><a href="#chp-https://dbi.r-dbi.org/reference/dbGetQuery" data-type="xref">#chp-https://dbi.r-dbi.org/reference/dbGetQuery</a></code> to get the data, then turns the result into a tibble:</p>
|
||||
<p>To get all the data back into R, you call <code><a href="https://dplyr.tidyverse.org/reference/compute.html">collect()</a></code>. Behind the scenes, this generates the SQL, calls <code><a href="https://dbi.r-dbi.org/reference/dbGetQuery.html">dbGetQuery()</a></code> to get the data, then turns the result into a tibble:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">big_diamonds <- big_diamonds_db |>
|
||||
collect()
|
||||
@@ -263,7 +263,7 @@ big_diamonds
|
||||
#> 6 1.73 Very Good G VS1 15014
|
||||
#> # … with 1,649 more rows</pre>
|
||||
</div>
|
||||
<p>Typically, you’ll use dbplyr to select the data you want from the database, performing basic filtering and aggregation using the translations described below. Then, once you’re ready to analyse the data with functions that are unique to R, you’ll <code><a href="#chp-https://dplyr.tidyverse.org/reference/compute" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/compute</a></code> the data to get an in-memory tibble, and continue your work with pure R code.</p>
|
||||
<p>Typically, you’ll use dbplyr to select the data you want from the database, performing basic filtering and aggregation using the translations described below. Then, once you’re ready to analyse the data with functions that are unique to R, you’ll <code><a href="https://dplyr.tidyverse.org/reference/compute.html">collect()</a></code> the data to get an in-memory tibble, and continue your work with pure R code.</p>
|
||||
</section>
|
||||
|
||||
<section id="sql" data-type="sect1">
|
||||
@@ -343,7 +343,7 @@ diamonds_db <- tbl(con, in_catalog("north_america", "sales", "diamonds"))</pr
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db <- tbl(con, sql("SELECT * FROM diamonds"))</pre>
|
||||
</div>
|
||||
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="#chp-https://github.com/tidyverse/dbplyr/issues/" data-type="xref">#chp-https://github.com/tidyverse/dbplyr/issues/</a> to help us do better.</p>
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="https://github.com/tidyverse/dbplyr/issues/">on GitHub</a> to help us do better.</p>
|
||||
|
||||
<p>In the examples above note that <code>"year"</code> and <code>"type"</code> are wrapped in double quotes. That’s because these are <strong>reserved words</strong> in duckdb, so dbplyr quotes them to avoid any potential confusion between column/table names and SQL operators.</p><p>When working with other databases you’re likely to see every variable name quotes because only a handful of client packages, like duckdb, know what all the reserved words are, so they quote everything to be safe.</p><pre data-type="programlisting" data-code-language="sql">SELECT "tailnum", "type", "manufacturer", "model", "year"
|
||||
FROM "planes"</pre><p>Some other database systems use backticks instead of quotes:</p><pre data-type="programlisting" data-code-language="sql">SELECT `tailnum`, `type`, `manufacturer`, `model`, `year`
|
||||
@@ -354,8 +354,8 @@ FROM `planes`</pre></div>
|
||||
<section id="select" data-type="sect2">
|
||||
<h2>
|
||||
SELECT</h2>
|
||||
<p>The <code>SELECT</code> clause is the workhorse of queries and performs the same job as <code><a href="#chp-https://dplyr.tidyverse.org/reference/select" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/select</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/rename" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/rename</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/relocate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/relocate</a></code>, and, as you’ll learn in the next section, <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code>.</p>
|
||||
<p><code><a href="#chp-https://dplyr.tidyverse.org/reference/select" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/select</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/rename" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/rename</a></code>, and <code><a href="#chp-https://dplyr.tidyverse.org/reference/relocate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/relocate</a></code> have very direct translations to <code>SELECT</code> as they just affect where a column appears (if at all) along with its name:</p>
|
||||
<p>The <code>SELECT</code> clause is the workhorse of queries and performs the same job as <code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code>, and, as you’ll learn in the next section, <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>.</p>
|
||||
<p><code><a href="https://dplyr.tidyverse.org/reference/select.html">select()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/rename.html">rename()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/relocate.html">relocate()</a></code> have very direct translations to <code>SELECT</code> as they just affect where a column appears (if at all) along with its name:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">planes |>
|
||||
select(tailnum, type, manufacturer, model, year) |>
|
||||
@@ -380,7 +380,7 @@ planes |>
|
||||
#> SELECT tailnum, manufacturer, model, "type", "year"
|
||||
#> FROM planes</pre>
|
||||
</div>
|
||||
<p>This example also shows you how SQL does renaming. In SQL terminology renaming is called <strong>aliasing</strong> and is done with <code>AS</code>. Note that unlike <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code>, the old name is on the left and the new name is on the right.</p>
|
||||
<p>This example also shows you how SQL does renaming. In SQL terminology renaming is called <strong>aliasing</strong> and is done with <code>AS</code>. Note that unlike <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code>, the old name is on the left and the new name is on the right.</p>
|
||||
<div data-type="note"><div class="callout-body d-flex">
|
||||
<div class="callout-icon-container">
|
||||
<i class="callout-icon"/>
|
||||
@@ -397,13 +397,13 @@ diamonds_db <- tbl(con, in_catalog("north_america", "sales", "diamonds"))</pr
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db <- tbl(con, sql("SELECT * FROM diamonds"))</pre>
|
||||
</div>
|
||||
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="#chp-https://github.com/tidyverse/dbplyr/issues/" data-type="xref">#chp-https://github.com/tidyverse/dbplyr/issues/</a> to help us do better.</p>
|
||||
<p>Note that while SQL is a standard, it is extremely complex and no database follows it exactly. While the main components that we’ll focus on in this book are very similar between DBMSs, there are many minor variations. Fortunately, dbplyr is designed to handle this problem and generates different translations for different databases. It’s not perfect, but it’s continually improving, and if you hit a problem you can file an issue <a href="https://github.com/tidyverse/dbplyr/issues/">on GitHub</a> to help us do better.</p>
|
||||
|
||||
<p>In the examples above note that <code>"year"</code> and <code>"type"</code> are wrapped in double quotes. That’s because these are <strong>reserved words</strong> in duckdb, so dbplyr quotes them to avoid any potential confusion between column/table names and SQL operators.</p><p>When working with other databases you’re likely to see every variable name quotes because only a handful of client packages, like duckdb, know what all the reserved words are, so they quote everything to be safe.</p><pre data-type="programlisting" data-code-language="sql">SELECT "tailnum", "type", "manufacturer", "model", "year"
|
||||
FROM "planes"</pre><p>Some other database systems use backticks instead of quotes:</p><pre data-type="programlisting" data-code-language="sql">SELECT `tailnum`, `type`, `manufacturer`, `model`, `year`
|
||||
FROM `planes`</pre></div>
|
||||
|
||||
<p>The translations for <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> are similarly straightforward: each variable becomes a new expression in <code>SELECT</code>:</p>
|
||||
<p>The translations for <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> are similarly straightforward: each variable becomes a new expression in <code>SELECT</code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
mutate(
|
||||
@@ -426,7 +426,7 @@ FROM</h2>
|
||||
<section id="group-by" data-type="sect2">
|
||||
<h2>
|
||||
GROUP BY</h2>
|
||||
<p><code><a href="#chp-https://dplyr.tidyverse.org/reference/group_by" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/group_by</a></code> is translated to the <code>GROUP BY</code><span data-type="footnote">This is no coincidence: the dplyr function name was inspired by the SQL clause.</span> clause and <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> is translated to the <code>SELECT</code> clause:</p>
|
||||
<p><code><a href="https://dplyr.tidyverse.org/reference/group_by.html">group_by()</a></code> is translated to the <code>GROUP BY</code><span data-type="footnote">This is no coincidence: the dplyr function name was inspired by the SQL clause.</span> clause and <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> is translated to the <code>SELECT</code> clause:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db |>
|
||||
group_by(cut) |>
|
||||
@@ -440,13 +440,13 @@ GROUP BY</h2>
|
||||
#> FROM diamonds
|
||||
#> GROUP BY cut</pre>
|
||||
</div>
|
||||
<p>We’ll come back to what’s happening with translation <code><a href="#chp-https://dplyr.tidyverse.org/reference/context" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/context</a></code> and <code><a href="#chp-https://rdrr.io/r/base/mean" data-type="xref">#chp-https://rdrr.io/r/base/mean</a></code> in <a href="#sec-sql-expressions" data-type="xref">#sec-sql-expressions</a>.</p>
|
||||
<p>We’ll come back to what’s happening with translation <code><a href="https://dplyr.tidyverse.org/reference/context.html">n()</a></code> and <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code> in <a href="#sec-sql-expressions" data-type="xref">#sec-sql-expressions</a>.</p>
|
||||
</section>
|
||||
|
||||
<section id="where" data-type="sect2">
|
||||
<h2>
|
||||
WHERE</h2>
|
||||
<p><code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> is translated to the <code>WHERE</code> clause:</p>
|
||||
<p><code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> is translated to the <code>WHERE</code> clause:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
filter(dest == "IAH" | dest == "HOU") |>
|
||||
@@ -499,7 +499,7 @@ flights |>
|
||||
#> 6 LAX 0.547
|
||||
#> # … with more rows</pre>
|
||||
</div>
|
||||
<p>If you want to learn more about how NULLs work, you might enjoy “<a href="#chp-https://modern-sql.com/concept/three-valued-logic" data-type="xref">#chp-https://modern-sql.com/concept/three-valued-logic</a>” by Markus Winand.</p>
|
||||
<p>If you want to learn more about how NULLs work, you might enjoy “<a href="https://modern-sql.com/concept/three-valued-logic"><em>Three valued logic</em></a>” by Markus Winand.</p>
|
||||
<p>In general, you can work with <code>NULL</code>s using the functions you’d use for <code>NA</code>s in R:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
@@ -512,7 +512,7 @@ flights |>
|
||||
</div>
|
||||
<p>This SQL query illustrates one of the drawbacks of dbplyr: while the SQL is correct, it isn’t as simple as you might write by hand. In this case, you could drop the parentheses and use a special operator that’s easier to read:</p>
|
||||
<pre data-type="programlisting" data-code-language="sql">WHERE "dep_delay" IS NOT NULL</pre>
|
||||
<p>Note that if you <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> a variable that you created using a summarize, dbplyr will generate a <code>HAVING</code> clause, rather than a <code>FROM</code> clause. This is a one of the idiosyncracies of SQL created because <code>WHERE</code> is evaluated before <code>SELECT</code>, so it needs another clause that’s evaluated afterwards.</p>
|
||||
<p>Note that if you <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> a variable that you created using a summarize, dbplyr will generate a <code>HAVING</code> clause, rather than a <code>FROM</code> clause. This is a one of the idiosyncracies of SQL created because <code>WHERE</code> is evaluated before <code>SELECT</code>, so it needs another clause that’s evaluated afterwards.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">diamonds_db |>
|
||||
group_by(cut) |>
|
||||
@@ -530,7 +530,7 @@ flights |>
|
||||
<section id="order-by" data-type="sect2">
|
||||
<h2>
|
||||
ORDER BY</h2>
|
||||
<p>Ordering rows involves a straightforward translation from <code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code> to the <code>ORDER BY</code> clause:</p>
|
||||
<p>Ordering rows involves a straightforward translation from <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code> to the <code>ORDER BY</code> clause:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
arrange(year, month, day, desc(dep_delay)) |>
|
||||
@@ -540,7 +540,7 @@ ORDER BY</h2>
|
||||
#> FROM flights
|
||||
#> ORDER BY "year", "month", "day", dep_delay DESC</pre>
|
||||
</div>
|
||||
<p>Notice how <code><a href="#chp-https://dplyr.tidyverse.org/reference/desc" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/desc</a></code> is translated to <code>DESC</code>: this is one of the many dplyr functions whose name was directly inspired by SQL.</p>
|
||||
<p>Notice how <code><a href="https://dplyr.tidyverse.org/reference/desc.html">desc()</a></code> is translated to <code>DESC</code>: this is one of the many dplyr functions whose name was directly inspired by SQL.</p>
|
||||
</section>
|
||||
|
||||
<section id="subqueries" data-type="sect2">
|
||||
@@ -562,7 +562,7 @@ Subqueries</h2>
|
||||
#> FROM flights
|
||||
#> ) q01</pre>
|
||||
</div>
|
||||
<p>You’ll also see this if you attempted to <code><a href="#chp-https://dplyr.tidyverse.org/reference/filter" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/filter</a></code> a variable that you just created. Remember, even though <code>WHERE</code> is written after <code>SELECT</code>, it’s evaluated before it, so we need a subquery in this (silly) example:</p>
|
||||
<p>You’ll also see this if you attempted to <code><a href="https://dplyr.tidyverse.org/reference/filter.html">filter()</a></code> a variable that you just created. Remember, even though <code>WHERE</code> is written after <code>SELECT</code>, it’s evaluated before it, so we need a subquery in this (silly) example:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
mutate(year1 = year + 1) |>
|
||||
@@ -603,7 +603,7 @@ Joins</h2>
|
||||
#> ON (flights.tailnum = planes.tailnum)</pre>
|
||||
</div>
|
||||
<p>The main thing to notice here is the syntax: SQL joins use sub-clauses of the <code>FROM</code> clause to bring in additional tables, using <code>ON</code> to define how the tables are related.</p>
|
||||
<p>dplyr’s names for these functions are so closely connected to SQL that you can easily guess the equivalent SQL for <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate-joins" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate-joins</a></code>, <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate-joins" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate-joins</a></code>, and <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate-joins" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate-joins</a></code>:</p>
|
||||
<p>dplyr’s names for these functions are so closely connected to SQL that you can easily guess the equivalent SQL for <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">inner_join()</a></code>, <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">right_join()</a></code>, and <code><a href="https://dplyr.tidyverse.org/reference/mutate-joins.html">full_join()</a></code>:</p>
|
||||
<pre data-type="programlisting" data-code-language="sql">SELECT flights.*, "type", manufacturer, model, engines, seats, speed
|
||||
FROM flights
|
||||
INNER JOIN planes ON (flights.tailnum = planes.tailnum)
|
||||
@@ -615,19 +615,19 @@ RIGHT JOIN planes ON (flights.tailnum = planes.tailnum)
|
||||
SELECT flights.*, "type", manufacturer, model, engines, seats, speed
|
||||
FROM flights
|
||||
FULL JOIN planes ON (flights.tailnum = planes.tailnum)</pre>
|
||||
<p>You’re likely to need many joins when working with data from a database. That’s because database tables are often stored in a highly normalized form, where each “fact” is stored in a single place and to keep a complete dataset for analysis you need to navigate a complex network of tables connected by primary and foreign keys. If you hit this scenario, the <a href="#chp-https://cynkra.github.io/dm/" data-type="xref">#chp-https://cynkra.github.io/dm/</a>, by Tobias Schieferdecker, Kirill Müller, and Darko Bergant, is a life saver. It can automatically determine the connections between tables using the constraints that DBAs often supply, visualize the connections so you can see what’s going on, and generate the joins you need to connect one table to another.</p>
|
||||
<p>You’re likely to need many joins when working with data from a database. That’s because database tables are often stored in a highly normalized form, where each “fact” is stored in a single place and to keep a complete dataset for analysis you need to navigate a complex network of tables connected by primary and foreign keys. If you hit this scenario, the <a href="https://cynkra.github.io/dm/">dm package</a>, by Tobias Schieferdecker, Kirill Müller, and Darko Bergant, is a life saver. It can automatically determine the connections between tables using the constraints that DBAs often supply, visualize the connections so you can see what’s going on, and generate the joins you need to connect one table to another.</p>
|
||||
</section>
|
||||
|
||||
<section id="other-verbs" data-type="sect2">
|
||||
<h2>
|
||||
Other verbs</h2>
|
||||
<p>dbplyr also translates other verbs like <code><a href="#chp-https://dplyr.tidyverse.org/reference/distinct" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/distinct</a></code>, <code>slice_*()</code>, and <code><a href="#chp-https://generics.r-lib.org/reference/setops" data-type="xref">#chp-https://generics.r-lib.org/reference/setops</a></code>, and a growing selection of tidyr functions like <code><a href="#chp-https://tidyr.tidyverse.org/reference/pivot_longer" data-type="xref">#chp-https://tidyr.tidyverse.org/reference/pivot_longer</a></code> and <code><a href="#chp-https://tidyr.tidyverse.org/reference/pivot_wider" data-type="xref">#chp-https://tidyr.tidyverse.org/reference/pivot_wider</a></code>. The easiest way to see the full set of what’s currently available is to visit the dbplyr website: <a href="https://dbplyr.tidyverse.org/reference/" class="uri">https://dbplyr.tidyverse.org/reference/</a>.</p>
|
||||
<p>dbplyr also translates other verbs like <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code>, <code>slice_*()</code>, and <code><a href="https://generics.r-lib.org/reference/setops.html">intersect()</a></code>, and a growing selection of tidyr functions like <code><a href="https://tidyr.tidyverse.org/reference/pivot_longer.html">pivot_longer()</a></code> and <code><a href="https://tidyr.tidyverse.org/reference/pivot_wider.html">pivot_wider()</a></code>. The easiest way to see the full set of what’s currently available is to visit the dbplyr website: <a href="https://dbplyr.tidyverse.org/reference/" class="uri">https://dbplyr.tidyverse.org/reference/</a>.</p>
|
||||
</section>
|
||||
|
||||
<section id="exercises" data-type="sect2">
|
||||
<h2>
|
||||
Exercises</h2>
|
||||
<ol type="1"><li><p>What is <code><a href="#chp-https://dplyr.tidyverse.org/reference/distinct" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/distinct</a></code> translated to? How about <code><a href="#chp-https://rdrr.io/r/utils/head" data-type="xref">#chp-https://rdrr.io/r/utils/head</a></code>?</p></li>
|
||||
<ol type="1"><li><p>What is <code><a href="https://dplyr.tidyverse.org/reference/distinct.html">distinct()</a></code> translated to? How about <code><a href="https://rdrr.io/r/utils/head.html">head()</a></code>?</p></li>
|
||||
<li>
|
||||
<p>Explain what each of the following SQL queries do and try recreate them using dbplyr.</p>
|
||||
<pre data-type="programlisting" data-code-language="sql">SELECT *
|
||||
@@ -643,8 +643,8 @@ FROM flights</pre>
|
||||
<section id="sec-sql-expressions" data-type="sect1">
|
||||
<h1>
|
||||
Function translations</h1>
|
||||
<p>So far we’ve focused on the big picture of how dplyr verbs are translated to the clauses of a query. Now we’re going to zoom in a little and talk about the translation of the R functions that work with individual columns, e.g. what happens when you use <code>mean(x)</code> in a <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code>?</p>
|
||||
<p>To help see what’s going on, we’ll use a couple of little helper functions that run a <code><a href="#chp-https://dplyr.tidyverse.org/reference/summarise" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/summarise</a></code> or <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> and show the generated SQL. That will make it a little easier to explore a few variations and see how summaries and transformations can differ.</p>
|
||||
<p>So far we’ve focused on the big picture of how dplyr verbs are translated to the clauses of a query. Now we’re going to zoom in a little and talk about the translation of the R functions that work with individual columns, e.g. what happens when you use <code>mean(x)</code> in a <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code>?</p>
|
||||
<p>To help see what’s going on, we’ll use a couple of little helper functions that run a <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarise()</a></code> or <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> and show the generated SQL. That will make it a little easier to explore a few variations and see how summaries and transformations can differ.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">summarize_query <- function(df, ...) {
|
||||
df |>
|
||||
@@ -657,7 +657,7 @@ mutate_query <- function(df, ...) {
|
||||
show_query()
|
||||
}</pre>
|
||||
</div>
|
||||
<p>Let’s dive in with some summaries! Looking at the code below you’ll notice that some summary functions, like <code><a href="#chp-https://rdrr.io/r/base/mean" data-type="xref">#chp-https://rdrr.io/r/base/mean</a></code>, have a relatively simple translation while others, like <code><a href="#chp-https://rdrr.io/r/stats/median" data-type="xref">#chp-https://rdrr.io/r/stats/median</a></code>, are much more complex. The complexity is typically higher for operations that are common in statistics but less common in databases.</p>
|
||||
<p>Let’s dive in with some summaries! Looking at the code below you’ll notice that some summary functions, like <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code>, have a relatively simple translation while others, like <code><a href="https://rdrr.io/r/stats/median.html">median()</a></code>, are much more complex. The complexity is typically higher for operations that are common in statistics but less common in databases.</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
group_by(year, month, day) |>
|
||||
@@ -677,7 +677,7 @@ mutate_query <- function(df, ...) {
|
||||
#> FROM flights
|
||||
#> GROUP BY "year", "month", "day"</pre>
|
||||
</div>
|
||||
<p>The translation of summary functions becomes more complicated when you use them inside a <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code> because they have to turn into a window function. In SQL, you turn an ordinary aggregation function into a window function by adding <code>OVER</code> after it:</p>
|
||||
<p>The translation of summary functions becomes more complicated when you use them inside a <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">mutate()</a></code> because they have to turn into a window function. In SQL, you turn an ordinary aggregation function into a window function by adding <code>OVER</code> after it:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
group_by(year, month, day) |>
|
||||
@@ -693,7 +693,7 @@ mutate_query <- function(df, ...) {
|
||||
#> FROM flights</pre>
|
||||
</div>
|
||||
<p>In SQL, the <code>GROUP BY</code> clause is used exclusively for summary so here you can see that the grouping has moved to the <code>PARTITION BY</code> argument to <code>OVER</code>.</p>
|
||||
<p>Window functions include all functions that look forward or backwards, like <code><a href="#chp-https://dplyr.tidyverse.org/reference/lead-lag" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/lead-lag</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/lead-lag" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/lead-lag</a></code>:</p>
|
||||
<p>Window functions include all functions that look forward or backwards, like <code><a href="https://dplyr.tidyverse.org/reference/lead-lag.html">lead()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/lead-lag.html">lag()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
group_by(dest) |>
|
||||
@@ -710,8 +710,8 @@ mutate_query <- function(df, ...) {
|
||||
#> FROM flights
|
||||
#> ORDER BY time_hour</pre>
|
||||
</div>
|
||||
<p>Here it’s important to <code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code> the data, because SQL tables have no intrinsic order. In fact, if you don’t use <code><a href="#chp-https://dplyr.tidyverse.org/reference/arrange" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/arrange</a></code> you might get the rows back in a different order every time! Notice for window functions, the ordering information is repeated: the <code>ORDER BY</code> clause of the main query doesn’t automatically apply to window functions.</p>
|
||||
<p>Another important SQL function is <code>CASE WHEN</code>. It’s used as the translation of <code><a href="#chp-https://dplyr.tidyverse.org/reference/if_else" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/if_else</a></code> and <code><a href="#chp-https://dplyr.tidyverse.org/reference/case_when" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/case_when</a></code>, the dplyr function that it directly inspired. Here’s a couple of simple examples:</p>
|
||||
<p>Here it’s important to <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code> the data, because SQL tables have no intrinsic order. In fact, if you don’t use <code><a href="https://dplyr.tidyverse.org/reference/arrange.html">arrange()</a></code> you might get the rows back in a different order every time! Notice for window functions, the ordering information is repeated: the <code>ORDER BY</code> clause of the main query doesn’t automatically apply to window functions.</p>
|
||||
<p>Another important SQL function is <code>CASE WHEN</code>. It’s used as the translation of <code><a href="https://dplyr.tidyverse.org/reference/if_else.html">if_else()</a></code> and <code><a href="https://dplyr.tidyverse.org/reference/case_when.html">case_when()</a></code>, the dplyr function that it directly inspired. Here’s a couple of simple examples:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
mutate_query(
|
||||
@@ -737,7 +737,7 @@ flights |>
|
||||
#> END AS description
|
||||
#> FROM flights</pre>
|
||||
</div>
|
||||
<p><code>CASE WHEN</code> is also used for some other functions that don’t have a direct translation from R to SQL. A good example of this is <code><a href="#chp-https://rdrr.io/r/base/cut" data-type="xref">#chp-https://rdrr.io/r/base/cut</a></code>:</p>
|
||||
<p><code>CASE WHEN</code> is also used for some other functions that don’t have a direct translation from R to SQL. A good example of this is <code><a href="https://rdrr.io/r/base/cut.html">cut()</a></code>:</p>
|
||||
<div class="cell">
|
||||
<pre data-type="programlisting" data-code-language="downlit">flights |>
|
||||
mutate_query(
|
||||
@@ -755,16 +755,16 @@ flights |>
|
||||
#> END AS description
|
||||
#> FROM flights</pre>
|
||||
</div>
|
||||
<p>dbplyr also translates common string and date-time manipulation functions, which you can learn about in <code><a href="#chp-https://dbplyr.tidyverse.org/articles/translation-function" data-type="xref">#chp-https://dbplyr.tidyverse.org/articles/translation-function</a></code>. dbplyr’s translations are certainly not perfect, and there are many R functions that aren’t translated yet, but dbplyr does a surprisingly good job covering the functions that you’ll use most of the time.</p>
|
||||
<p>dbplyr also translates common string and date-time manipulation functions, which you can learn about in <code><a href="https://dbplyr.tidyverse.org/articles/translation-function.html">vignette("translation-function", package = "dbplyr")</a></code>. dbplyr’s translations are certainly not perfect, and there are many R functions that aren’t translated yet, but dbplyr does a surprisingly good job covering the functions that you’ll use most of the time.</p>
|
||||
|
||||
<section id="learning-more" data-type="sect2">
|
||||
<h2>
|
||||
Learning more</h2>
|
||||
<p>If you’ve finished this chapter and would like to learn more about SQL. We have two recommendations:</p>
|
||||
<ul><li>
|
||||
<a href="#chp-https://sqlfordatascientists" data-type="xref">#chp-https://sqlfordatascientists</a> by Renée M. P. Teate is an introduction to SQL designed specifically for the needs of data scientists, and includes examples of the sort of highly interconnected data you’re likely to encounter in real organisations.</li>
|
||||
<a href="https://sqlfordatascientists.com"><em>SQL for Data Scientists</em></a> by Renée M. P. Teate is an introduction to SQL designed specifically for the needs of data scientists, and includes examples of the sort of highly interconnected data you’re likely to encounter in real organisations.</li>
|
||||
<li>
|
||||
<a href="#chp-https://www.practicalsql" data-type="xref">#chp-https://www.practicalsql</a> by Anthony DeBarros is written from the perspective of a data journalist (a data scientist specialized in telling compelling stories) and goes into more detail about getting your data into a database and running your own DBMS.</li>
|
||||
<a href="https://www.practicalsql.com"><em>Practical SQL</em></a> by Anthony DeBarros is written from the perspective of a data journalist (a data scientist specialized in telling compelling stories) and goes into more detail about getting your data into a database and running your own DBMS.</li>
|
||||
</ul></section>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
Reference in New Issue
Block a user