Don't transform non-crossref links

This commit is contained in:
Hadley Wickham
2022-11-18 10:30:32 -06:00
parent 4caea5281b
commit 78a1c12fe7
32 changed files with 693 additions and 693 deletions

View File

@@ -38,15 +38,15 @@ What you wont learn</h1>
<h2>
Modeling</h2>
<!--# TO DO: Say a few sentences about modelling. -->
<p>To learn more about modeling, we highly recommend <a href="#chp-https://www.tmwr" data-type="xref">#chp-https://www.tmwr</a>, by our colleagues Max Kuhn and Julia Silge. This book will teach you the tidymodels family of packages, which, as you might guess from the name, share many conventions with the tidyverse packages we use in this book.</p>
<p>To learn more about modeling, we highly recommend <a href="https://www.tmwr.org">Tidy Modeling with R</a>, by our colleagues Max Kuhn and Julia Silge. This book will teach you the tidymodels family of packages, which, as you might guess from the name, share many conventions with the tidyverse packages we use in this book.</p>
</section>
<section id="big-data" data-type="sect2">
<h2>
Big data</h2>
<p>This book proudly focuses on small, in-memory datasets. This is the right place to start because you cant tackle big data unless you have experience with small data. The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care, you can typically use them to work with 1-2 Gb of data. If youre routinely working with larger data (10-100 Gb, say), you should learn more about <a href="#chp-https://github.com/Rdatatable/data" data-type="xref">#chp-https://github.com/Rdatatable/data</a>. This book doesnt teach data.table because it has a very concise interface that offers fewer linguistic cues, which makes it harder to learn. However, if youre working with large data, the performance payoff is well worth the effort required to learn it.</p>
<p>This book proudly focuses on small, in-memory datasets. This is the right place to start because you cant tackle big data unless you have experience with small data. The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care, you can typically use them to work with 1-2 Gb of data. If youre routinely working with larger data (10-100 Gb, say), you should learn more about <a href="https://github.com/Rdatatable/data.table">data.table</a>. This book doesnt teach data.table because it has a very concise interface that offers fewer linguistic cues, which makes it harder to learn. However, if youre working with large data, the performance payoff is well worth the effort required to learn it.</p>
<p>If your data is bigger than this, carefully consider whether your big data problem is actually a small data problem in disguise. While the complete data set might be big, often the data needed to answer a specific question is small. You might be able to find a subset, subsample, or summary that fits in memory and still allows you to answer the question that youre interested in. The challenge here is finding the right small data, which often requires a lot of iteration.</p>
<p>Another possibility is that your big data problem is actually a large number of small data problems in disguise. Each individual problem might fit in memory, but you have millions of them. For example, you might want to fit a model to each person in your dataset. This would be trivial if you had just 10 or 100 people, but instead you have a million. Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like <a href="#chp-https://hadoop.apache.org/" data-type="xref">#chp-https://hadoop.apache.org/</a> or <a href="#chp-https://spark.apache.org/" data-type="xref">#chp-https://spark.apache.org/</a>) that allows you to send different datasets to different computers for processing. Once youve figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like <strong>sparklyr</strong> to solve it for the full dataset.</p>
<p>Another possibility is that your big data problem is actually a large number of small data problems in disguise. Each individual problem might fit in memory, but you have millions of them. For example, you might want to fit a model to each person in your dataset. This would be trivial if you had just 10 or 100 people, but instead you have a million. Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like <a href="https://hadoop.apache.org/">Hadoop</a> or <a href="https://spark.apache.org/">Spark</a>) that allows you to send different datasets to different computers for processing. Once youve figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like <strong>sparklyr</strong> to solve it for the full dataset.</p>
</section>
<section id="python-julia-and-friends" data-type="sect2">
@@ -61,7 +61,7 @@ Python, Julia, and friends</h2>
<section id="prerequisites" data-type="sect1">
<h1>
Prerequisites</h1>
<p>Weve made a few assumptions about what you already know in order to get the most out of this book. You should be generally numerically literate, and its helpful if you have some programming experience already. If youve never programmed before, you might find <a href="#chp-https://rstudio-education.github.io/hopr/" data-type="xref">#chp-https://rstudio-education.github.io/hopr/</a> by Garrett to be a useful adjunct to this book.</p>
<p>Weve made a few assumptions about what you already know in order to get the most out of this book. You should be generally numerically literate, and its helpful if you have some programming experience already. If youve never programmed before, you might find <a href="https://rstudio-education.github.io/hopr/">Hands on Programming with R</a> by Garrett to be a useful adjunct to this book.</p>
<p>There are four things you need to run the code in this book: R, RStudio, a collection of R packages called the <strong>tidyverse</strong>, and a handful of other packages. Packages are the fundamental units of reproducible R code. They include reusable functions, the documentation that describes how to use them, and sample data.</p>
<section id="r" data-type="sect2">
@@ -95,7 +95,7 @@ The tidyverse</h2>
<pre data-type="programlisting" data-code-language="downlit">install.packages("tidyverse")</pre>
</div>
<p>On your own computer, type that line of code in the console, and then press enter to run it. R will download the packages from CRAN and install them on to your computer. If you have problems installing, make sure that you are connected to the internet, and that <a href="https://cloud.r-project.org/" class="uri">https://cloud.r-project.org/</a> isnt blocked by your firewall or proxy.</p>
<p>You will not be able to use the functions, objects, or help files in a package until you load it with <code><a href="#chp-https://rdrr.io/r/base/library" data-type="xref">#chp-https://rdrr.io/r/base/library</a></code>. Once you have installed a package, you can load it using the <code><a href="#chp-https://rdrr.io/r/base/library" data-type="xref">#chp-https://rdrr.io/r/base/library</a></code> function:</p>
<p>You will not be able to use the functions, objects, or help files in a package until you load it with <code><a href="https://rdrr.io/r/base/library.html">library()</a></code>. Once you have installed a package, you can load it using the <code><a href="https://rdrr.io/r/base/library.html">library()</a></code> function:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">library(tidyverse)
#&gt; ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
@@ -108,7 +108,7 @@ The tidyverse</h2>
#&gt; ✖ dplyr::lag() masks stats::lag()</pre>
</div>
<p>This tells you that tidyverse is loading eight packages: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, and forcats packages. These are considered to be the <strong>core</strong> of the tidyverse because youll use them in almost every analysis.</p>
<p>Packages in the tidyverse change fairly frequently. You can check whether updates are available, and optionally install them, by running <code><a href="#chp-https://tidyverse.tidyverse.org/reference/tidyverse_update" data-type="xref">#chp-https://tidyverse.tidyverse.org/reference/tidyverse_update</a></code>.</p>
<p>Packages in the tidyverse change fairly frequently. You can check whether updates are available, and optionally install them, by running <code><a href="https://tidyverse.tidyverse.org/reference/tidyverse_update.html">tidyverse_update()</a></code>.</p>
</section>
<section id="other-packages" data-type="sect2">
@@ -136,9 +136,9 @@ Running R code</h1>
[1] 3</code></pre>
<p>There are two main differences. In your console, you type after the <code>&gt;</code>, called the <strong>prompt</strong>; we dont show the prompt in the book. In the book, output is commented out with <code>#&gt;</code>; in your console it appears directly after your code. These two differences mean that if youre working with an electronic version of the book, you can easily copy code out of the book and into the console.</p>
<p>Throughout the book, we use a consistent set of conventions to refer to code:</p>
<ul><li><p>Functions are displayed in a code font and followed by parentheses, like <code><a href="#chp-https://rdrr.io/r/base/sum" data-type="xref">#chp-https://rdrr.io/r/base/sum</a></code>, or <code><a href="#chp-https://rdrr.io/r/base/mean" data-type="xref">#chp-https://rdrr.io/r/base/mean</a></code>.</p></li>
<ul><li><p>Functions are displayed in a code font and followed by parentheses, like <code><a href="https://rdrr.io/r/base/sum.html">sum()</a></code>, or <code><a href="https://rdrr.io/r/base/mean.html">mean()</a></code>.</p></li>
<li><p>Other R objects (such as data or function arguments) are in a code font, without parentheses, like <code>flights</code> or <code>x</code>.</p></li>
<li><p>Sometimes, to make it clear which package an object comes from, well use well use the package name followed by two colons, like <code><a href="#chp-https://dplyr.tidyverse.org/reference/mutate" data-type="xref">#chp-https://dplyr.tidyverse.org/reference/mutate</a></code>, or<br/><code><a href="#chp-https://rdrr.io/pkg/nycflights13/man/flights" data-type="xref">#chp-https://rdrr.io/pkg/nycflights13/man/flights</a></code>. This is also valid R code.</p></li>
<li><p>Sometimes, to make it clear which package an object comes from, well use well use the package name followed by two colons, like <code><a href="https://dplyr.tidyverse.org/reference/mutate.html">dplyr::mutate()</a></code>, or<br/><code><a href="https://rdrr.io/pkg/nycflights13/man/flights.html">nycflights13::flights</a></code>. This is also valid R code.</p></li>
</ul></section>
<section id="acknowledgements" data-type="sect1">
@@ -147,7 +147,7 @@ Acknowledgements</h1>
<p>This book isnt just the product of Hadley, Mine, and Garrett, but is the result of many conversations (in person and online) that weve had with many people in the R community. There are a few people wed like to thank in particular, because they have spent many hours answering our questions and helping us to better think about data science:</p>
<ul><li><p>Jenny Bryan and Lionel Henry for many helpful discussions around working with lists and list-columns.</p></li>
<li><p>The three chapters on workflow were adapted (with permission), from <a href="https://stat545.com/block002_hello-r-workspace-wd-project.html" class="uri">https://stat545.com/block002_hello-r-workspace-wd-project.html</a> by Jenny Bryan.</p></li>
<li><p>Yihui Xie for his work on the <a href="#chp-https://github.com/rstudio/bookdown" data-type="xref">#chp-https://github.com/rstudio/bookdown</a> package, and for tirelessly responding to my feature requests.</p></li>
<li><p>Yihui Xie for his work on the <a href="https://github.com/rstudio/bookdown">bookdown</a> package, and for tirelessly responding to my feature requests.</p></li>
<li><p>Bill Behrman for his thoughtful reading of the entire book, and for trying it out with his data science class at Stanford.</p></li>
<li><p>The #rstats Twitter community who reviewed all of the draft chapters and provided tons of useful feedback.</p></li>
</ul><p>This book was written in the open, and many people contributed pull requests to fix minor problems. Special thanks goes to everyone who contributed via GitHub:</p>
@@ -160,7 +160,7 @@ Acknowledgements</h1>
<section id="colophon" data-type="sect1">
<h1>
Colophon</h1>
<p>An online version of this book is available at <a href="https://r4ds.hadley.nz" class="uri">https://r4ds.hadley.nz</a>. It will continue to evolve in between reprints of the physical book. The source of the book is available at <a href="https://github.com/hadley/r4ds" class="uri">https://github.com/hadley/r4ds</a>. The book is powered by <a href="#chp-https://quarto" data-type="xref">#chp-https://quarto</a> which makes it easy to write books that combine text and executable code.</p>
<p>An online version of this book is available at <a href="https://r4ds.hadley.nz" class="uri">https://r4ds.hadley.nz</a>. It will continue to evolve in between reprints of the physical book. The source of the book is available at <a href="https://github.com/hadley/r4ds" class="uri">https://github.com/hadley/r4ds</a>. The book is powered by <a href="https://quarto.org">Quarto</a> which makes it easy to write books that combine text and executable code.</p>
<p>This book was built with:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="downlit">sessioninfo::session_info(c("tidyverse"))