In this part of the book, you'll improve your programming skills.
Programming is a cross-cutting skill needed for all data science work: you must use a computer to do data science; you cannot do it in your head, or with pencil and paper.
In the following three chapters, you'll learn skills that will allow you to both tackle new programs and to solve existing problems with greater clarity and ease:
Instead, in [Chapter -@sec-functions], you'll learn how to write **functions** which let you extract out repeated code so that it can be easily reused.
2. As you start to write more powerful functions, you'll need a solid grounding in R's **data structures**, provided by vectors, which we discuss in [Chapter -@sec-vectors].
You must master the four common atomic vectors, the three important S3 classes built on top of them, and understand the mysteries of the list and data frame.
A common theme throughout these chapters is the idea of reducing duplication in your code.
Reducing code duplication has three main benefits:
1. It's easier to see the intent of your code, because your eyes are drawn to what's different, not what stays the same.
2. It's easier to respond to changes in requirements.
As your needs change, you only need to make changes in one place, rather than remembering to change every place that you copied-and-pasted the code.
3. You're likely to have fewer bugs because each line of code is used in more places.
One tool for reducing duplication is functions, which reduce duplication by identifying repeated patterns of code and extract them out into independent pieces that can be easily reused and updated.
Another tool for reducing duplication is **iteration**, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.
The goal of these chapters is to teach you the minimum about programming that you need to practice data science, which turns out to be a reasonable amount.
Learning more about programming is a long-term investment: it won't pay off immediately, but in the long term it will allow you to solve new problems more quickly, and let you reuse your insights from previous problems in new scenarios.
To learn more you need to study R as a programming language, not just an interactive environment for data science.
We have written two books that will help you do so: