R calls categorical or qualitative variables, factor variables. When reading CSV files, R makes character variables into factor variables by default. However, when a factor variable has been coded with numbers, R assumes that the variable is quantitative. The HTWT data shows how this can be a problem.
htwt = read.csv("http://facweb1.redlands.edu/fac/jim_bentley/Data/FYS04/HtWt.csv")
summary(htwt)
## Height Weight Group
## Min. :51.0 Min. : 82.0 Min. :1.00
## 1st Qu.:56.0 1st Qu.:108.2 1st Qu.:1.00
## Median :59.5 Median :123.5 Median :2.00
## Mean :62.1 Mean :139.6 Mean :1.55
## 3rd Qu.:68.0 3rd Qu.:166.8 3rd Qu.:2.00
## Max. :79.0 Max. :228.0 Max. :2.00
Note that the variable Group has been treated as numeric. It turns out that this variable actually represents the sex of the individual and that males were coded as 1 and females as 2. We convert the numeric variable to a factor variable.
is.numeric(htwt$Group)
## [1] TRUE
is.factor(htwt$Group)
## [1] FALSE
table(htwt$Group)
##
## 1 2
## 9 11
htwt$Group = factor(htwt$Group, labels=c("Male","Female"))
is.numeric(htwt$Group)
## [1] FALSE
is.factor(htwt$Group)
## [1] TRUE
summary(htwt$Group)
## Male Female
## 9 11
table(htwt$Group)
##
## Male Female
## 9 11
R uses factor variables to keep track of ordinal data. The ordered argument should be set to TRUE. We will use data on phone service satisfaction to show how this works.
phone = c(rep("Poor",840),rep("Fair",1649),rep("Good",4787),rep("Excellent",3208))
# At this point phone is a list of strings and not a factor
is.factor(phone)
## [1] FALSE
# Use the function factor to convert the variable
phone.u = factor(phone)
is.factor(phone.u)
## [1] TRUE
table(phone.u)
## phone.u
## Excellent Fair Good Poor
## 3208 1649 4787 840
# Note that the output is alphabetical and not properly ordered
# Recreate phone as an ordered factor variable
phone.o = factor(phone, levels = c("Poor","Fair","Good","Excellent"), ordered=TRUE)
table(phone.o)
## phone.o
## Poor Fair Good Excellent
## 840 1649 4787 3208
# The values in the table are now ordered
We now create plots to go with the tables.
# Use base graphics
barplot(table(htwt$Group))
barplot(table(phone.u), main="Unordered Factor", col="red")
barplot(table(phone.o), main="Ordered Factor", col="lightblue")
# Use lattice plots
p_load(lattice)
histogram(~phone.u)
histogram(~phone.o)
barchart(phone.o)
For those who just will not get rid of those stupid pie charts, R will make them. Why anyone would want to is a mystery.
# Use base graphics
pie(table(htwt$Group))
pie(table(phone.u))
pie(table(phone.o))
# Can't use lattice since it won't make pie charts