Minor updates to intro
This commit is contained in:
parent
5ec12ac2f6
commit
027848d806
|
@ -88,7 +88,7 @@ This is the right place to start because you can't tackle big data unless you ha
|
|||
The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care, you can typically use them to work with 1-2 Gb of data.
|
||||
If you're routinely working with larger data (10-100 Gb, say), you should learn more about [data.table](https://github.com/Rdatatable/data.table).
|
||||
This book doesn't teach data.table because it has a very concise interface that offers fewer linguistic cues, which makes it harder to learn.
|
||||
However, if you're working with large data, the performance payoff is worth the extra effort required to learn it.
|
||||
However, if you're working with large data, the performance payoff is well worth the effort required to learn it.
|
||||
|
||||
If your data is bigger than this, carefully consider whether your big data problem is actually a small data problem in disguise.
|
||||
While the complete data set might be big, often the data needed to answer a specific question is small.
|
||||
|
@ -100,7 +100,7 @@ Each individual problem might fit in memory, but you have millions of them.
|
|||
For example, you might want to fit a model to each person in your dataset.
|
||||
This would be trivial if you had just 10 or 100 people, but instead you have a million.
|
||||
Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like [Hadoop](https://hadoop.apache.org/) or [Spark](https://spark.apache.org/)) that allows you to send different datasets to different computers for processing.
|
||||
Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **sparklyr**, **rhipe**, and **ddr** to solve it for the full dataset.
|
||||
Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **sparklyr** to solve it for the full dataset.
|
||||
|
||||
### Python, Julia, and friends
|
||||
|
||||
|
@ -148,7 +148,7 @@ Download and install it from <http://www.rstudio.com/download>.
|
|||
RStudio is updated a couple of times a year.
|
||||
When a new version is available, RStudio will let you know.
|
||||
It's a good idea to upgrade regularly so you can take advantage of the latest and greatest features.
|
||||
For this book, make sure you have at least RStudio 1.6.0.
|
||||
For this book, make sure you have at least RStudio 2022.02.0.
|
||||
|
||||
When you start RStudio, you'll see two key regions in the interface: the console pane, and the output pane.
|
||||
|
||||
|
|
Loading…
Reference in New Issue