For this example we will use the HTWT data. In particular, we will look at the weights (WEIGHT) of the 20 individuals in the data set.
htwt = read.csv("http://facweb1.redlands.edu/fac/jim_bentley/downloads/math111/htwt.csv")
head(htwt)
## Height Weight Group
## 1 64 159 1
## 2 63 155 2
## 3 67 157 2
## 4 60 125 1
## 5 52 103 2
## 6 58 122 2
The raw weight data for the sample data set can be seen by asking for the proper column. Three equivalent ways of accessing the second, Weight column are given below.
htwt$Weight
## [1] 159 155 157 125 103 122 101 82 228 199 195 110 191 151 119 119 112 87 190
## [20] 87
htwt[,"Weight"]
## [1] 159 155 157 125 103 122 101 82 228 199 195 110 191 151 119 119 112 87 190
## [20] 87
htwt[,2]
## [1] 159 155 157 125 103 122 101 82 228 199 195 110 191 151 119 119 112 87 190
## [20] 87
To find the minimum and maximum, we first sort the data. The min and the max are the first and last observations respectively. The internal functions min and max can be used as well. Finally, the function range finds both the minimum and maximum.
n = length(htwt$Weight)
n
## [1] 20
wt.sorted = sort(htwt$Weight)
wt.sorted
## [1] 82 87 87 101 103 110 112 119 119 122 125 151 155 157 159 190 191 195 199
## [20] 228
wt.min = wt.sorted[1]
wt.min
## [1] 82
wt.max = wt.sorted[n]
wt.max
## [1] 228
min(htwt$Weight)
## [1] 82
max(htwt$Weight)
## [1] 228
range(htwt$Weight)
## [1] 82 228
The median is a value that divides the sorted data in half. If there are an even number of observations, the median is not unique. Convention dictates that in the case of an even number of observations we take the average of the two middle values.
If there are an odd number of observations we use the (n+1)/2 observation in the sorted data. If the number of observations is even we average the (n/2) and (n/2)+1 observations. Since the Weight data has 20 observations, we will use the second method.
R has the built-in function median that also provides the median.
# Create a function to figure out if an integer n is odd
odd = function(i){
# If i is not an integer then stop
if (!is.integer(i)){stop("Not an integer.")}
# Otherwise, if the division by 2 has no remainder, then i is even
else if (i %% 2 == 0) {return(FALSE)} else {return(TRUE)}
}
n = length(htwt$Weight)
odd(n)
## [1] FALSE
n.low = n/2
n.low
## [1] 10
n.high = (n/2)+1
n.high
## [1] 11
wt.sorted = sort(htwt$Weight)
wt.sorted
## [1] 82 87 87 101 103 110 112 119 119 122 125 151 155 157 159 190 191 195 199
## [20] 228
wt.sorted[n.low]
## [1] 122
wt.sorted[n.high]
## [1] 125
wt.median = (wt.sorted[n.low]+wt.sorted[n.high])/2
wt.median
## [1] 123.5
median(htwt$Weight)
## [1] 123.5
The lower and upper quartiles, Q1 and Q3, are the 25th and 75th percentiles respectively. These values are not necessarily unique. One way to find the values is to find the medians of the sorted observations that are less and greater than the median.
R has a function quantile that returns the requested quantiles (percentiles/100).
To compute the lower quartile we first note that n=20 is even. So the median splits the data into two groups of 10 observations. Since 10 is even, we must average the 5th and 6th observations. Similarly, the upper quartile is the average of the 15th and 16th observations.
n
## [1] 20
wt.q1 = (wt.sorted[5]+wt.sorted[6])/2
wt.q1
## [1] 106.5
wt.q3 = (wt.sorted[15]+wt.sorted[16])/2
wt.q3
## [1] 174.5
quantile(htwt$Weight, c(.25, .75))
## 25% 75%
## 108.25 166.75
# Note that q1 and the 25 percentile are both between the 5th and 6th obs
wt.sorted[c(5,6)]
## [1] 103 110
# Note that q3 and the 75 percentile are both between the 15th and 16th obs
wt.sorted[c(15,16)]
## [1] 159 190
The five number summary is composed of the minimum, Q1, median, Q3, and the maximum. We can find these numbers as above or use the quantile funtion. The function summary can also be used. It should be noted that the summary function includes the mean.
c(wt.min, wt.q1, wt.median, wt.q3, wt.max)
## [1] 82.0 106.5 123.5 174.5 228.0
quantile(htwt$Weight)
## 0% 25% 50% 75% 100%
## 82.00 108.25 123.50 166.75 228.00
summary(htwt$Weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 82.0 108.2 123.5 139.6 166.8 228.0