committed by
					
						 Hadley Wickham
						Hadley Wickham
					
				
			
			
				
	
			
			
			
						parent
						
							2afd79c1f6
						
					
				
				
					commit
					0f956d64db
				
			
							
								
								
									
										14
									
								
								factors.Rmd
									
									
									
									
									
								
							
							
						
						
									
										14
									
								
								factors.Rmd
									
									
									
									
									
								
							| @@ -144,7 +144,7 @@ When working with factors, the two most common operations are changing the order | |||||||
| It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions: | It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions: | ||||||
|  |  | ||||||
| ```{r} | ```{r} | ||||||
| relig <- gss_cat %>% | relig_summary <- gss_cat %>% | ||||||
|   group_by(relig) %>% |   group_by(relig) %>% | ||||||
|   summarise( |   summarise( | ||||||
|     age = mean(age, na.rm = TRUE), |     age = mean(age, na.rm = TRUE), | ||||||
| @@ -152,7 +152,7 @@ relig <- gss_cat %>% | |||||||
|     n = n() |     n = n() | ||||||
|   ) |   ) | ||||||
|  |  | ||||||
| ggplot(relig, aes(tvhours, relig)) + geom_point() | ggplot(relig_summary, aes(tvhours, relig)) + geom_point() | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments: | It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments: | ||||||
| @@ -163,7 +163,7 @@ It is difficult to interpret this plot because there's no overall pattern. We ca | |||||||
|   `x` for each value of `f`. The default value is `median`. |   `x` for each value of `f`. The default value is `median`. | ||||||
|  |  | ||||||
| ```{r} | ```{r} | ||||||
| ggplot(relig, aes(tvhours, fct_reorder(relig, tvhours))) + | ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) + | ||||||
|   geom_point() |   geom_point() | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| @@ -172,7 +172,7 @@ Reordering religion makes it much easier to see that people in the "Don't know" | |||||||
| As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as: | As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as: | ||||||
|  |  | ||||||
| ```{r, eval = FALSE} | ```{r, eval = FALSE} | ||||||
| relig %>% | relig_summary %>% | ||||||
|   mutate(relig = fct_reorder(relig, tvhours)) %>% |   mutate(relig = fct_reorder(relig, tvhours)) %>% | ||||||
|   ggplot(aes(tvhours, relig)) + |   ggplot(aes(tvhours, relig)) + | ||||||
|     geom_point() |     geom_point() | ||||||
| @@ -180,7 +180,7 @@ relig %>% | |||||||
| What if we create a similar plot looking at how average age varies across reported income level? | What if we create a similar plot looking at how average age varies across reported income level? | ||||||
|  |  | ||||||
| ```{r} | ```{r} | ||||||
| rincome <- gss_cat %>% | rincome_summary <- gss_cat %>% | ||||||
|   group_by(rincome) %>% |   group_by(rincome) %>% | ||||||
|   summarise( |   summarise( | ||||||
|     age = mean(age, na.rm = TRUE), |     age = mean(age, na.rm = TRUE), | ||||||
| @@ -188,7 +188,7 @@ rincome <- gss_cat %>% | |||||||
|     n = n() |     n = n() | ||||||
|   ) |   ) | ||||||
|  |  | ||||||
| ggplot(rincome, aes(age, fct_reorder(rincome, age))) + geom_point() | ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) + geom_point() | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered. | Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered. | ||||||
| @@ -196,7 +196,7 @@ Here, arbitrarily reordering the levels isn't a good idea! That's because `rinco | |||||||
| However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line. | However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line. | ||||||
|  |  | ||||||
| ```{r} | ```{r} | ||||||
| ggplot(rincome, aes(age, fct_relevel(rincome, "Not applicable"))) + | ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) + | ||||||
|   geom_point() |   geom_point() | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user