committed by
					
						
						Hadley Wickham
					
				
			
			
				
	
			
			
			
						parent
						
							8828f80842
						
					
				
				
					commit
					c3a2688b56
				
			
							
								
								
									
										4
									
								
								EDA.Rmd
									
									
									
									
									
								
							
							
						
						
									
										4
									
								
								EDA.Rmd
									
									
									
									
									
								
							@@ -163,7 +163,7 @@ Clusters of similar values suggest that subgroups exist in your data. To underst
 | 
			
		||||
 | 
			
		||||
* Why might the appearance of clusters be misleading?
 | 
			
		||||
 | 
			
		||||
The histogram shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone National Park. Eruption times appear to be clustered in to two groups: there are short eruptions (of around 2 minutes) and long eruption (4-5 minutes), but little in between.
 | 
			
		||||
The histogram shows the length (in minutes) of 272 eruptions of the Old Faithful Geyser in Yellowstone National Park. Eruption times appear to be clustered into two groups: there are short eruptions (of around 2 minutes) and long eruptions (4-5 minutes), but little in between.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
ggplot(data = faithful, mapping = aes(x = eruptions)) + 
 | 
			
		||||
@@ -174,7 +174,7 @@ Many of the questions above will prompt you to explore a relationship *between*
 | 
			
		||||
 | 
			
		||||
### Unusual values
 | 
			
		||||
 | 
			
		||||
Outliers are observations that are unusual; data points that are don't seem to fit the pattern. Sometimes outliers are data entry errors; other times outliers suggest important new science. When you have a lot of data, outliers are sometimes difficult to see in a histogram.  For example, take the distribution of the `x` variable from the diamonds dataset. The only evidence of outliers is the unusually wide limits on the x-axis.
 | 
			
		||||
Outliers are observations that are unusual; data points that don't seem to fit the pattern. Sometimes outliers are data entry errors; other times outliers suggest important new science. When you have a lot of data, outliers are sometimes difficult to see in a histogram.  For example, take the distribution of the `x` variable from the diamonds dataset. The only evidence of outliers is the unusually wide limits on the x-axis.
 | 
			
		||||
 | 
			
		||||
```{r}
 | 
			
		||||
ggplot(diamonds) + 
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user