Minor updates to intro
This commit is contained in:
		| @@ -88,7 +88,7 @@ This is the right place to start because you can't tackle big data unless you ha | ||||
| The tools you learn in this book will easily handle hundreds of megabytes of data, and with a little care, you can typically use them to work with 1-2 Gb of data. | ||||
| If you're routinely working with larger data (10-100 Gb, say), you should learn more about [data.table](https://github.com/Rdatatable/data.table). | ||||
| This book doesn't teach data.table because it has a very concise interface that offers fewer linguistic cues, which makes it harder to learn. | ||||
| However, if you're working with large data, the performance payoff is worth the extra effort required to learn it. | ||||
| However, if you're working with large data, the performance payoff is well worth the effort required to learn it. | ||||
|  | ||||
| If your data is bigger than this, carefully consider whether your big data problem is actually a small data problem in disguise. | ||||
| While the complete data set might be big, often the data needed to answer a specific question is small. | ||||
| @@ -100,7 +100,7 @@ Each individual problem might fit in memory, but you have millions of them. | ||||
| For example, you might want to fit a model to each person in your dataset. | ||||
| This would be trivial if you had just 10 or 100 people, but instead you have a million. | ||||
| Fortunately, each problem is independent of the others (a setup that is sometimes called embarrassingly parallel), so you just need a system (like [Hadoop](https://hadoop.apache.org/) or [Spark](https://spark.apache.org/)) that allows you to send different datasets to different computers for processing. | ||||
| Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **sparklyr**, **rhipe**, and **ddr** to solve it for the full dataset. | ||||
| Once you've figured out how to answer your question for a single subset using the tools described in this book, you can learn new tools like **sparklyr** to solve it for the full dataset. | ||||
|  | ||||
| ### Python, Julia, and friends | ||||
|  | ||||
| @@ -148,7 +148,7 @@ Download and install it from <http://www.rstudio.com/download>. | ||||
| RStudio is updated a couple of times a year. | ||||
| When a new version is available, RStudio will let you know. | ||||
| It's a good idea to upgrade regularly so you can take advantage of the latest and greatest features. | ||||
| For this book, make sure you have at least RStudio 1.6.0. | ||||
| For this book, make sure you have at least RStudio 2022.02.0. | ||||
|  | ||||
| When you start RStudio, you'll see two key regions in the interface: the console pane, and the output pane. | ||||
|  | ||||
|   | ||||
		Reference in New Issue
	
	Block a user