Load all packages for Chapter 0.

library(Stat2Data)
library(mosaic)
library(ggplot2)
library(dplyr)

Create a data frame for ThreeCars2017 and WeightlossIncentive4.

data(ThreeCars2017)
data(WeightLossIncentive4)

TABLE 0.1 Viewing the data in the table and some other variables in this data set.

head(ThreeCars2017)

EXAMPLE 0.2 Fininacial Incentives for weight loss

str(WeightLossIncentive4)
## 'data.frame':    36 obs. of  2 variables:
##  $ WeightLoss: num  12.5 12 1 -5 3 -5 7.5 -2.5 20 -1 ...
##  $ Group     : Factor w/ 2 levels "Control","Incentive": 1 1 1 1 1 1 1 1 1 1 ...

Viewing the data frame: The head command will show you the first 6 rows. Similarly, the tail command will show you the last 6 rows. Notice that the data frame is set up differently than Table 0.2. Statistical software packages often prefer this long form where each column corresponds to a variable.

head(WeightLossIncentive4)

EXAMPLE 0.2 CHOOSE and FIT

Using the mosiac package to get summary statistics by Group. Some users will prefer to use the mosaic package because they may have used it in an introductory course. The mosaic package is very useful, and we encourage you to explore with this package and use the commands that you like best.

favstats(WeightLoss~Group, data=WeightLossIncentive4)

FIGURE 0.1 Side-by-side dotplots and boxplots

dotplot(WeightLoss~Group, data=WeightLossIncentive4)

boxplot(WeightLoss~Group, data=WeightLossIncentive4)

The dotplot using mosaic does not show all of the points, so let’s try a more sophisticated approach with the ggplot2 package.

p <- ggplot(WeightLossIncentive4, aes(x=Group, y=WeightLoss))
p1 <- p + geom_dotplot(binaxis='y', stackdir='up', stackratio = 1.5, method="histodot", binwidth=0.8) 
p1

EXAMPLE 0.2 ASSESS

Now for residual plots. First, we’ll subtract group means.

WeightLossIncentive4 <- WeightLossIncentive4 %>% group_by(Group) %>% mutate(Resids = WeightLoss - mean(WeightLoss))
favstats(Resids~Group, data=WeightLossIncentive4)

Dotplots of Residuals

p <- ggplot(WeightLossIncentive4, aes(x=Group, y=Resids))
p1 <- p + geom_dotplot(binaxis='y', stackdir='up', stackratio = 1.5, method="histodot", binwidth=0.8) 
p1

Histograms of the residuals

histogram(~Resids|Group, data=WeightLossIncentive4)

Normal quantile plots

qqnorm(WeightLossIncentive4$WeightLoss)
qqline(WeightLossIncentive4$WeightLoss)

Now we need separate QQ plots for each group.

qplot(sample=Resids, data=WeightLossIncentive4, color=Group)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

qplot(sample=Resids, data=WeightLossIncentive4, facets=~Group)

EXAMPLE 0.2 USE

Two-sample t-test

t.test(WeightLoss~Group, data=WeightLossIncentive4)
## 
##  Welch Two Sample t-test
## 
## data:  WeightLoss by Group
## t = -3.7982, df = 33.276, p-value = 0.0005889
## alternative hypothesis: true difference in means between group Control and group Incentive is not equal to 0
## 95 percent confidence interval:
##  -18.05026  -5.46058
## sample estimates:
##   mean in group Control mean in group Incentive 
##                3.921053               15.676471
t.test(WeightLoss~Group, data=WeightLossIncentive4, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  WeightLoss by Group
## t = -3.8054, df = 34, p-value = 0.0005635
## alternative hypothesis: true difference in means between group Control and group Incentive is not equal to 0
## 95 percent confidence interval:
##  -18.033330  -5.477506
## sample estimates:
##   mean in group Control mean in group Incentive 
##                3.921053               15.676471

Effect size, using the descriptive statistics from favstats in mosiac package

result<-favstats(WeightLoss~Group, data=WeightLossIncentive4)
#extract the sample sizes, means, and standard deviations from the object that was created with favstats
nc = result$n[1]
ni = result$n[1]
ybarc = result$mean[1]
ybari = result$mean[2]
StDevc = result$sd[1]
StDevi = result$sd[2]
#compute the pooled standard deviation sp
sp=sqrt( ((nc-1)*StDevc**2 +(ni-1)*StDevi**2)/ (nc+ni-2) )
#compute the effect size
effectsize=abs(ybarc-ybari)/sp
effectsize
## [1] 1.269189

Alternative solutions to the recommended approaches above.

Load packages for alternative solutions.

#Some users may be more familiar with using the psych package to get descriptive statistics by a grouping variable
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:mosaic':
## 
##     logit, rescale
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
#The effsize package can be used to get the effect size without extracting all of the statistics from favstats object
library(effsize)
## 
## Attaching package: 'effsize'
## The following object is masked from 'package:psych':
## 
##     cohen.d

Summary Statistics by Group

describe(WeightLossIncentive4$WeightLoss)
describeBy(WeightLossIncentive4$WeightLoss, WeightLossIncentive4$Group)
## 
##  Descriptive statistics by group 
## group: Control
##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 19 3.92 9.11      3    4.21 8.15 -17  20    37 -0.13    -0.37 2.09
## ------------------------------------------------------------ 
## group: Incentive
##    vars  n  mean   sd median trimmed   mad  min max range  skew kurtosis   se
## X1    1 17 15.68 9.41     18    15.8 11.86 -0.5  30  30.5 -0.12    -1.51 2.28

Computing the effect size with effsize package: Note the effect size is also known as Cohen’s d.

cohen.d(WeightLossIncentive4$WeightLoss, WeightLossIncentive4$Group)
## 
## Cohen's d
## 
## d estimate: -1.270424 (large)
## 95 percent confidence interval:
##      lower      upper 
## -2.0139911 -0.5268576