From 3b86fb5b3d7b3bb03f29f01c4fba39bed28ce99f Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Fri, 12 Aug 2016 20:57:39 +0200 Subject: [PATCH 01/19] Update wrangle.Rmd (#247) Typo --- wrangle.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wrangle.Rmd b/wrangle.Rmd index 4c15a76..b04dde7 100644 --- a/wrangle.Rmd +++ b/wrangle.Rmd @@ -23,7 +23,7 @@ This part of the book proceeds as follows: You'll learn the underlying principles, and how to get your data into a tidy form. -Data wrangling also encompasses data transformation, which you've already learn a little about. Now we'll focus new skills for three specific types of data you will frequently encounter in practice: +Data wrangling also encompasses data transformation, which you've already learn a little about. Now we'll focus on new skills for three specific types of data you will frequently encounter in practice: * [Dates and times] will give you the key tools for working with dates and date-times. From 509f70902de86a356a1026b0282f42cf9ec65a82 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Fri, 12 Aug 2016 21:28:53 +0200 Subject: [PATCH 02/19] Update tibble.Rmd (#248) Typo --- tibble.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tibble.Rmd b/tibble.Rmd index fc5fce1..4d942ad 100644 --- a/tibble.Rmd +++ b/tibble.Rmd @@ -16,7 +16,7 @@ library(tibble) ## Creating tibbles {#tibbles} -The almost all of the functions that you'll use in this book produce tibbles as using tibbles is one of the common features of packages in the tidyverse. Most other R packages use regular data frames, so you might want to coerce a data frame to a tibble. You can do that with `as_tibble()`: +Almost all of the functions that you'll use in this book produce tibbles as using tibbles is one of the common features of packages in the tidyverse. Most other R packages use regular data frames, so you might want to coerce a data frame to a tibble. You can do that with `as_tibble()`: ```{r} as_tibble(iris) From b3aa1ff9e790489a202698c04a39c22743ce0825 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Fri, 12 Aug 2016 22:04:31 +0200 Subject: [PATCH 03/19] Update tibble.Rmd (#249) Typo --- tibble.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tibble.Rmd b/tibble.Rmd index 4d942ad..dd1815e 100644 --- a/tibble.Rmd +++ b/tibble.Rmd @@ -67,7 +67,7 @@ I often add a comment (the line starting with `#`), to make it really clear wher ## Tibbles vs. data frames -There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting. +There are two main differences in the usage of a data frame vs a tibble: printing and subsetting. ### Printing @@ -83,7 +83,7 @@ tibble( ) ``` -Tibbles are designed so that you don't accidentally overwhelm your console when you print large dataframes. But sometimes you need more output than the default display. There are a few options that can help. +Tibbles are designed so that you don't accidentally overwhelm your console when you print large data frames. But sometimes you need more output than the default display. There are a few options that can help. First, you can explicitly `print()` the data frame and control the number of rows (`n`) and the `width` of the display. `width = Inf` will display all columns: @@ -112,7 +112,7 @@ nycflights13::flights %>% ### Subsetting -So far all the tools you've learned have worked with complete dataframes. If you want to pull out a single variable, you need some new tools, `$` and `[[`. `[[` can extract by name or position; `$` only extracts by name but is a little less typing. +So far all the tools you've learned have worked with complete data frames. If you want to pull out a single variable, you need some new tools, `$` and `[[`. `[[` can extract by name or position; `$` only extracts by name but is a little less typing. ```{r} df <- tibble( From 2c5a19e5ffd863df32dbaf8b9ac264989cee85d7 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Fri, 12 Aug 2016 22:40:24 +0200 Subject: [PATCH 04/19] Update import.Rmd (#250) Typo --- import.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/import.Rmd b/import.Rmd index 99b2b8e..62594b1 100644 --- a/import.Rmd +++ b/import.Rmd @@ -2,7 +2,7 @@ ## Introduction -Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data. In this chapter, you'll learn how to read plain-text rectangular files into R. Here, we'll only scratch surface of data import, but many of the principles will translate to the other forms of data. We'll finish with a few pointers to packages that useful for other types of data. +Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data. In this chapter, you'll learn how to read plain-text rectangular files into R. Here, we'll only scratch surface of data import, but many of the principles will translate to the other forms of data. We'll finish with a few pointers to packages that are useful for other types of data. ### Prerequisites @@ -30,7 +30,7 @@ Most of readr's functions are concerned with turning flat files into data frames [webreadr](https://github.com/Ironholds/webreadr) which is built on top of `read_log()` and provides many more helpful tools.) -These functions all have similar syntax: once you've mastered one, you can use the others with ease. For the rest of this chapter we'll focus on `read_csv()`. Not onl are csv files one of the most common forms of data storage, but once you understand `read_csv()`, you can easily apply your knowledge to all the other functions in readr. +These functions all have similar syntax: once you've mastered one, you can use the others with ease. For the rest of this chapter we'll focus on `read_csv()`. Not only are csv files one of the most common forms of data storage, but once you understand `read_csv()`, you can easily apply your knowledge to all the other functions in readr. The first argument to `read_csv()` is the most important: it's the path to the file to read. From d67c997d218bb6af844c8e3ffe353edad16db15f Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Sat, 13 Aug 2016 16:02:48 +0200 Subject: [PATCH 05/19] Update import.Rmd (#251) Typo --- import.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/import.Rmd b/import.Rmd index 62594b1..353a16a 100644 --- a/import.Rmd +++ b/import.Rmd @@ -250,7 +250,7 @@ charToRaw("Hadley") Each hexadecimal number represents a byte of information: `48` is H, `61` is a, and so on. The mapping from hexadecimal number to character is called the encoding, and in this case the encoding is called ASCII. ASCII does a great job of representing English characters, because it's the __American__ Standard Code for Information Interchange. -Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you need to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!). +Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you needed to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!). readr uses UTF-8 everywhere: it assumes your data is UTF-8 encoded when you read it, and always uses it when writing. This is a good default, but will fail for data produced by older systems that don't understand UTF-8. If this happens to you, your strings will look weird when you print them. Sometimes just one or two characters might be messed up; other times you'll get complete gibberish. For example: @@ -340,7 +340,7 @@ Time : `%M` minutes. : `%S` integer seconds. : `%OS` real seconds. -: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware abbreviations: +: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware of abbreviations: if you're American, note that "EST" is a Canadian time zone that does not have daylight savings time. It is \emph{not} Eastern Standard Time! We'll come back to this [time zones]. @@ -628,6 +628,6 @@ To get other types of data into R, we recommend starting with the tidyverse pack __RSQLite__, __RPostgreSQL__ etc) allows you to run SQL queries against a database and return a data frame. -For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. whichYou will need to convert them to data frames using the tools on [handling hierarchy]. +For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. You will need to convert them to data frames using the tools on [handling hierarchy]. For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [__rio__](https://github.com/leeper/rio) package. From 010ea3b0c833c672217de9f9058d3b0c9f6e6137 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Sat, 13 Aug 2016 16:03:31 +0200 Subject: [PATCH 06/19] Update tidy.Rmd (#252) Typo --- tidy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy.Rmd b/tidy.Rmd index de5f07c..be4c3e5 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -295,7 +295,7 @@ table3 %>% You can also pass a vector of integers to `sep`. `separate()` will interpret the integers as positions to split at. Positive values start at 1 on the far-left of the strings; negative value start at -1 on the far-right of the strings. When using integers to separate strings, the length of `sep` should be one less than the number of names in `into`. -You can use this arrangement to separate the last two digits of each year. This make this data lesss tidy, but is useful in other cases, as you'll see in a little bit. +You can use this arrangement to separate the last two digits of each year. This make this data less tidy, but is useful in other cases, as you'll see in a little bit. ```{r} table3 %>% From 2c754bac83efcfac0ad7eb3162f285fc1b38194f Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Sat, 13 Aug 2016 16:04:06 +0200 Subject: [PATCH 07/19] Update wrangle.Rmd (#253) Changed the order to reflect the ordering of the chapters --- wrangle.Rmd | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/wrangle.Rmd b/wrangle.Rmd index b04dde7..d15ec3d 100644 --- a/wrangle.Rmd +++ b/wrangle.Rmd @@ -25,11 +25,11 @@ This part of the book proceeds as follows: Data wrangling also encompasses data transformation, which you've already learn a little about. Now we'll focus on new skills for three specific types of data you will frequently encounter in practice: -* [Dates and times] will give you the key tools for working with - dates and date-times. +* [Relational data] will give you tools for working with multiple + interrelated datasets. * [Strings] will introduce regular expressions, a powerful tool for manipulating strings. -* [Relational data] will give you tools for working with multiple - interrelated datasets. +* [Dates and times] will give you the key tools for working with + dates and date-times. From 3404a00283ecc66b0fc904ad58da4605fba5cdca Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:31:48 -0400 Subject: [PATCH 08/19] Missing an s (#260) --- tidy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy.Rmd b/tidy.Rmd index be4c3e5..b4ae17e 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -440,7 +440,7 @@ The best place to start is almost always to gathering together the columns that * We don't know what all the other columns are yet, but given the structure in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`) - these are likely to be values, not variable. + these are likely to be values, not variables. So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells repesent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present. From 84d3ab2a2647ce34a0e25bc00c2b99e962a9a3ce Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:31:59 +0200 Subject: [PATCH 09/19] Update relational-data.Rmd (#261) Typo --- relational-data.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/relational-data.Rmd b/relational-data.Rmd index 747ca82..68757d0 100644 --- a/relational-data.Rmd +++ b/relational-data.Rmd @@ -472,7 +472,7 @@ The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _do knitr::include_graphics("diagrams/join-anti.png") ``` -Anti-joins are are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`: +Anti-joins are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`: ```{r} flights %>% From b0d830b18195d415560b61cbf0d54d3025b2a900 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:32:17 +0200 Subject: [PATCH 10/19] Update strings.Rmd (#262) Typo --- strings.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/strings.Rmd b/strings.Rmd index 7156e48..4a321d5 100644 --- a/strings.Rmd +++ b/strings.Rmd @@ -279,7 +279,7 @@ You can also match the boundary between words with `\b`. I don't often use this ### Character classes and alternatives -There are number of special patterns that match more than one character. You've already seen `.`, which matches any character apart from a newline. There are four other useful tools: +There are a number of special patterns that match more than one character. You've already seen `.`, which matches any character apart from a newline. There are four other useful tools: * `\d`: matches any digit. * `\s`: matches any whitespace (e.g. space, tab, newline). From 6c5eb2a45299941e0eb80cf700cab72fb1f27266 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:32:32 +0200 Subject: [PATCH 11/19] Update strings.Rmd (#263) Typo. In some exercises there seem like the spacing between the numbering is not consistent. I tried to fix one here. --- strings.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/strings.Rmd b/strings.Rmd index 4a321d5..76e4b3b 100644 --- a/strings.Rmd +++ b/strings.Rmd @@ -366,7 +366,7 @@ str_view(x, 'C[LX]+?') 1. `"\\{.+\\}"` 1. `\d{4}-\d{2}-\d{2}` 1. `"\\\\{4}"` - + 1. Create regular expressions to find all words that: 1. Start with three consonants. @@ -378,7 +378,7 @@ str_view(x, 'C[LX]+?') ### Grouping and backreferences -Earlier, you learned about parentheses as a way to disambiguate complex expressions. They also definie "groups" that you can refer to with _backreferences_, like `\1`, `\2` etc. For example, the following regular expression finds all fruits that have a repeated pair of letters. +Earlier, you learned about parentheses as a way to disambiguate complex expressions. They also define "groups" that you can refer to with _backreferences_, like `\1`, `\2` etc. For example, the following regular expression finds all fruits that have a repeated pair of letters. ```{r} str_view(fruit, "(..)\\1", match = TRUE) @@ -401,7 +401,7 @@ str_view(fruit, "(..)\\1", match = TRUE) 1. Start and end with the same character. 1. Contain a repeated pair of letters - (e.g. "church" contains "ch" repeated twice) + (e.g. "church" contains "ch" repeated twice.) 1. Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.) From 86613070143ed59eda2171e267398da7411d660d Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:32:45 +0200 Subject: [PATCH 12/19] Update datetimes.Rmd (#264) Typo --- datetimes.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/datetimes.Rmd b/datetimes.Rmd index 2a44801..a5a4d4a 100644 --- a/datetimes.Rmd +++ b/datetimes.Rmd @@ -2,7 +2,7 @@ ## Introduction -This chapter will show you how to work with dates and times in R. At first glance, dates and times seem simple. You use them all the time in your regular life, and they don't seem to cause much confusion. However, the more you learn about dates and times, the more complicated they seem to get. To warm yp, trying these three seemingly simple questions: +This chapter will show you how to work with dates and times in R. At first glance, dates and times seem simple. You use them all the time in your regular life, and they don't seem to cause much confusion. However, the more you learn about dates and times, the more complicated they seem to get. To warm up, try these three seemingly simple questions: * Does every year have 365 days? * Does every day have 24 hours? @@ -10,7 +10,7 @@ This chapter will show you how to work with dates and times in R. At first glanc I'm sure you know that not every year has 365 days, but do you know the full rule for determining if a year is a leap year? (It has three parts.) You might have remembered that many parts of the world use daylight savings time (DST), so that some days have 23 hours, and others have 25. You probably didn't know that some minutes have 61 seconds because every now and then leap seconds are added because the Earth's rotation is gradually slowing down. -Dates and times are hard because they have to reconcile two physical phenomenon (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenonmeon including months, time zones, and DST. This chapter won't teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges. +Dates and times are hard because they have to reconcile two physical phenomenon (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomenon including months, time zones, and DST. This chapter won't teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges. ### Prerequisites @@ -69,7 +69,7 @@ mdy("January 31st, 2017") dmy("31-Jan-2017") ``` -These functions also take unquoted numbers. This is the most concise way to create a single date/time object, as you might need when filtering date/time data. `ymd()` is short and ununambiguous: +These functions also take unquoted numbers. This is the most concise way to create a single date/time object, as you might need when filtering date/time data. `ymd()` is short and unambiguous: ```{r} ymd(20170131) From f9901e3e549bc45f81e20d2c9e9401bbf8e22406 Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:32:53 +0200 Subject: [PATCH 13/19] Update datetimes.Rmd (#265) Typo --- datetimes.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/datetimes.Rmd b/datetimes.Rmd index a5a4d4a..1c2c252 100644 --- a/datetimes.Rmd +++ b/datetimes.Rmd @@ -149,7 +149,7 @@ Note the two tricks I needed to create these plots: means 1 day. 1. R doesn't like to compare date-times with dates, so you can force - `ymd()` to geneate a date-time by supplying a `tz` argument. + `ymd()` to generate a date-time by supplying a `tz` argument. ### From other types @@ -322,7 +322,7 @@ Setting larger components of a date to a constant is a powerful technique that a 1. What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar? -1. Confirm my hypthosis that the early departures of flights in minutes +1. Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early. Hint: create a binary variable that tells you whether or not a flight was delayed. From 2c0c6a8be5361978684f8ba73cc9aeba76fc5757 Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:33:05 -0400 Subject: [PATCH 14/19] Spell check suggestions (#259) --- tidy.Rmd | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tidy.Rmd b/tidy.Rmd index b4ae17e..7c469e9 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -119,7 +119,7 @@ The second step is to resolve one of two common problems: 1. One variable might be spread across multiple columns. -1. One observation might be scattered across mutliple rows. +1. One observation might be scattered across multiple rows. Typically a dataset will only suffer from one of these problems; it'll only suffer from both if you're really unlucky! To fix these problems, you'll need the two most important functions in tidyr: `gather()` and `spread()`. @@ -185,10 +185,10 @@ To tidy this up, we first analyse the representation in similar way to `gather() * The column that contains variable names, the `key` column. Here, it's `type`. -* The column that contains values froms multiple variables, the `value` +* The column that contains values forms multiple variables, the `value` column. Here it's `count`. -Once we've figured that out, we can use `spread()`, as shown progammatically below, and visually in Figure \@ref(fig:tidy-spread). +Once we've figured that out, we can use `spread()`, as shown programmatically below, and visually in Figure \@ref(fig:tidy-spread). ```{r} spread(table2, key = type, value = count) @@ -317,7 +317,7 @@ table5 %>% unite(new, century, year) ``` -In this case we also need to use the `sep` arguent. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`: +In this case we also need to use the `sep` argument. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`: ```{r} table5 %>% @@ -345,7 +345,7 @@ table5 %>% ## Missing values -Changing the representation of a dataset brings up an important subtlety of missing values. Suprisingly, a value can be missing in one of two possible ways: +Changing the representation of a dataset brings up an important subtlety of missing values. Surprisingly, a value can be missing in one of two possible ways: * __Explicitly__, i.e. flagged with `NA`. * __Implicitly__, i.e. simply not present in the data. @@ -442,7 +442,7 @@ The best place to start is almost always to gathering together the columns that in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`) these are likely to be values, not variables. -So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells repesent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present. +So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present. ```{r} who1 <- who %>% @@ -550,7 +550,7 @@ who %>% ## Non-tidy data -Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the perjorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures: +Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the pejorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures: * Alternative representations may have substantial performance or space advantages. From 6b1ac1f40bc9c9dfc37a12c90343546793877c1d Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:33:12 -0400 Subject: [PATCH 15/19] Suggest word insertion (#258) --- tidy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy.Rmd b/tidy.Rmd index 7c469e9..0a69c48 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -49,7 +49,7 @@ Figure \@ref(fig:tidy-structure) shows the rules visually. knitr::include_graphics("images/tidy-1.png") ``` -These three rules are interrelated because it's impossible to only satisfy two of the three. That interrelationship leads to even simpler set of practical instructions: +These three rules are interrelated because it's impossible to only satisfy two of the three. That interrelationship leads to an even simpler set of practical instructions: 1. Put each dataset in a tibble. 1. Put each variable in a column. From 7773637bc9e5c527cb0484169fb4337fa47484e9 Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:33:19 -0400 Subject: [PATCH 16/19] Fix typo (#257) --- tidy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy.Rmd b/tidy.Rmd index 0a69c48..cd4ccb2 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -41,7 +41,7 @@ There are three interrelated rules which make a dataset tidy: 1. Each variable must have its own column. 1. Each observation must have its own row. -1. Each value much have its own cell. +1. Each value must have its own cell. Figure \@ref(fig:tidy-structure) shows the rules visually. From 7c9c28c3637b93b5419639770f5353f8571b2d87 Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:33:25 -0400 Subject: [PATCH 17/19] Fix typo (#256) --- tidy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidy.Rmd b/tidy.Rmd index cd4ccb2..3e1189f 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -8,7 +8,7 @@ > "Tidy datasets are all alike, but every messy dataset is messy in its > own way." --– Hadley Wickham -In this chapter, you will learn a consistent way to organise your data in R, a organisation called __tidy data__. Getting your data into this format requires some upfront work, but that work pays off in the long-term. Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the analytic questions at hand. +In this chapter, you will learn a consistent way to organise your data in R, an organisation called __tidy data__. Getting your data into this format requires some upfront work, but that work pays off in the long-term. Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the analytic questions at hand. This chapter will give you a practical introduction to tidy data and the accompanying tools in the __tidyr__ package. If you'd like to learn more about the underlying theory, you might enjoy the *Tidy Data* paper published in the Journal of Statistical Software, . From 8bc81d71b116f9054a9e2d8a1799e58ddbd16b0c Mon Sep 17 00:00:00 2001 From: S'busiso Mkhondwane Date: Mon, 15 Aug 2016 14:33:36 +0200 Subject: [PATCH 18/19] Update relational-data.Rmd (#255) Typo --- relational-data.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/relational-data.Rmd b/relational-data.Rmd index 68757d0..998b96a 100644 --- a/relational-data.Rmd +++ b/relational-data.Rmd @@ -16,7 +16,7 @@ To work with relational data you need verbs that work with pairs of tables. Ther * __Set operations__, which treat observations as if they were set elements. -The most common place to find relational data is in a _relational_ database management system (or RDBMS), a term that encompasses almost all modern databases. If you've used a database before, you've almost certainly used SQL. If so, you should find the concepts in this chapter familiar, although their expression in dplyr is a little different. Generally, dplyr is a little easier to use than SQL because dplyr is specialised to data analysis: it makes common data analysis operations easier, at the expense of making it more difficult to do other things that don't commonly need for data analysis. +The most common place to find relational data is in a _relational_ database management system (or RDBMS), a term that encompasses almost all modern databases. If you've used a database before, you've almost certainly used SQL. If so, you should find the concepts in this chapter familiar, although their expression in dplyr is a little different. Generally, dplyr is a little easier to use than SQL because dplyr is specialised to do data analysis: it makes common data analysis operations easier, at the expense of making it more difficult to do other things that don't commonly need for data analysis. ### Prerequisites @@ -176,7 +176,7 @@ flights2 <- flights %>% flights2 ``` -(Remember, when you're in RStudio, you can also use `View()` to avoid this problem). +(Remember, when you're in RStudio, you can also use `View()` to avoid this problem.) Imagine you want to add the full airline name to the `flights2` data. You can combine the `airlines` and `flights2` data frames with `left_join()`: @@ -186,7 +186,7 @@ flights2 %>% left_join(airlines, by = "carrier") ``` -The result of joining airlines to flights is an additional variable: `name`. This is why I call this type of join a mutating join. In this case, you could have got to the same place using `mutate()` and R's base subsetting: +The result of joining airlines to flights2 is an additional variable: `name`. This is why I call this type of join a mutating join. In this case, you could have got to the same place using `mutate()` and R's base subsetting: ```{r} flights2 %>% From 3c0f712b62a60d18b5c08bb0e5fd95b9cfa46473 Mon Sep 17 00:00:00 2001 From: harrismcgehee Date: Mon, 15 Aug 2016 08:33:44 -0400 Subject: [PATCH 19/19] Fix a typo (#254) --- tibble.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tibble.Rmd b/tibble.Rmd index dd1815e..89842db 100644 --- a/tibble.Rmd +++ b/tibble.Rmd @@ -147,7 +147,7 @@ Some older functions don't work with tibbles. If you encounter one of these func class(as.data.frame(tb)) ``` -The main reason that some older functions don't work with tibble is the `[` function. We don't use `[` much in this book much because `dplyr::filter()` and `dplyr::select()` allow you to solve the same problems with clearer code (but you will learn a little about it in [vector subsetting](#vector-subsetting). With base R data frames, `[` sometimes returns a data frame, and sometimes returns a vector. With tibbles, `[` always returns a nother tibble. +The main reason that some older functions don't work with tibble is the `[` function. We don't use `[` much in this book much because `dplyr::filter()` and `dplyr::select()` allow you to solve the same problems with clearer code (but you will learn a little about it in [vector subsetting](#vector-subsetting). With base R data frames, `[` sometimes returns a data frame, and sometimes returns a vector. With tibbles, `[` always returns another tibble. ## Exercises