In this Take-home Exercise, I will explore the demographic of the city of Engagement, Ohio USA.
In this take-home exercise, appropriate static statistical graphics methods are used to reveal the demographic of the city of Engagement, Ohio USA.
The data are processed by using appropriate tidyverse family of packages and the statistical graphics are prepared using ggplot2 and its extensions.
The picture below shows a sketch of the initial design proposed.
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse', 'ggdist', 'ggridges', 'patchwork', 'ggthemes')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports Participants.csv from the data
folder into R by using read_csv()
of readr
package and save it as an tibble data frame called
participants.
participants <- read_csv("data/Participants.csv")
glimpse(participants)
Rows: 1,011
Columns: 7
$ participantId <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
$ householdSize <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ~
$ haveKids <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU~
$ age <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2~
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",~
$ interestGroup <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", ~
$ joviality <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380~
We will examine the household size as well as the education level of
the participants in this data set. The code chunk below plots a bar
chart by using geom_bar() of ggplot2.
ggplot(data = participants,
aes(x=reorder(householdSize, householdSize, function(x)-length(x)))) +
geom_bar() +
ylim(0,550) +
geom_text(stat="count",
aes(label=paste0(..count.., " (",
round(..count../sum(..count..)*100,
1), "%)")),
vjust=-1) +
xlab("Household Size") +
ylab("No. of\nParticipants") +
ggtitle("Household Size of Participants")
ggplot(data = participants,
aes(x=reorder(educationLevel, educationLevel, function(x)-length(x)))) +
geom_bar() +
ylim(0,550) +
geom_text(stat="count",
aes(label=paste0(..count.., " (",
round(..count../sum(..count..)*100,
1), "%)")),
vjust=-1) +
xlab("Education Level") +
ylab("No. of\nParticipants") +
ggtitle("Education Level of Participants")
We will examine the joviality distribution of the participants across
interest groups by kids status in this data set. The code chunk below
plots a bar chart by using geom_boxplot() of
ggplot2.
ggplot(data=participants,
aes(y = joviality, x= interestGroup)) +
geom_boxplot() +
stat_summary(geom = "point",
fun.y="mean",
colour ="red",
size=3) +
facet_grid(haveKids ~.) +
ggtitle("Joviality across Interest Groups by Kids Status")
Joviality is defined as the characteristic of being cheerful and festive. A higher joviality index would represent a greater level of cheerfulness.
We will examine the joviality spread of the participants across the
education levels in this data set. The code chunk below creates
raincloud plots by using stat_halfeye() of
ggdist and geom_boxplot() of
ggplot2.
ggplot(participants, aes(x = educationLevel, y = joviality)) +
scale_y_continuous(breaks = seq(0, 1, 0.2),
limits = c(0, 1)) +
stat_halfeye(adjust = .33,
width = .67,
color = NA,
justification = -.01,
position = position_nudge(
x = .15)
) +
geom_boxplot(
width = .25,
outlier.shape = NA
) +
coord_flip() +
ggtitle("Joviality Spread by Education Level")
We will examine the joviality spread of the participants across the
interest groups in this data set. The code chunk below creates a ridge
plot by using geom_density_ridges() of
ggridges, an ggplot2 extension specially designed to
create ridge plot.
ggplot(participants,
aes(x = joviality, y = interestGroup)) +
geom_density_ridges(rel_min_height = 0.01,
scale = 1) +
ggtitle("Joviality Spread by Interest Group")
We will combine multiple plots together to have a dashboard view. The code chunk below creates a composite plot by using the patchwork package.
Note: p1 - p5 are assigned to plots as shown earlier.
patchwork <- ((p1 / p2)| p3)/(p4 | p5) +
plot_annotation(tag_levels = 'I', title = 'Demographic of the city of Engagement, Ohio USA')
patchwork & theme_economist()