Take-home Exercise 1

In this Take-home Exercise, I will explore the demographic of the city of Engagement, Ohio USA.

Che Xuan https://www.linkedin.com/in/jacob-che-xuan-b646a9123/
2022-04-24

Overview

In this take-home exercise, appropriate static statistical graphics methods are used to reveal the demographic of the city of Engagement, Ohio USA.

The data are processed by using appropriate tidyverse family of packages and the statistical graphics are prepared using ggplot2 and its extensions.

Sketch of Proposed Design

The picture below shows a sketch of the initial design proposed.

Installing & Lauching R Packages

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse', 'ggdist', 'ggridges', 'patchwork', 'ggthemes')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Importing Data

The code chunk below imports Participants.csv from the data folder into R by using read_csv() of readr package and save it as an tibble data frame called participants.

participants <- read_csv("data/Participants.csv")

glimpse(participants)
Rows: 1,011
Columns: 7
$ participantId  <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
$ householdSize  <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ~
$ haveKids       <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU~
$ age            <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2~
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",~
$ interestGroup  <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", ~
$ joviality      <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380~

Static Bar Chart

We will examine the household size as well as the education level of the participants in this data set. The code chunk below plots a bar chart by using geom_bar() of ggplot2.

ggplot(data = participants,
       aes(x=reorder(householdSize, householdSize, function(x)-length(x)))) +
  geom_bar() +
  ylim(0,550) +
  geom_text(stat="count", 
      aes(label=paste0(..count.., " (", 
      round(..count../sum(..count..)*100,
            1), "%)")),
      vjust=-1) +
  xlab("Household Size") +
  ylab("No. of\nParticipants") +
  ggtitle("Household Size of Participants")

ggplot(data = participants,
       aes(x=reorder(educationLevel, educationLevel, function(x)-length(x)))) +
  geom_bar() +
  ylim(0,550) +
  geom_text(stat="count", 
      aes(label=paste0(..count.., " (", 
      round(..count../sum(..count..)*100,
            1), "%)")),
      vjust=-1) +
  xlab("Education Level") +
  ylab("No. of\nParticipants") +
  ggtitle("Education Level of Participants")

Static Trellis Boxplot

We will examine the joviality distribution of the participants across interest groups by kids status in this data set. The code chunk below plots a bar chart by using geom_boxplot() of ggplot2.

ggplot(data=participants, 
       aes(y = joviality, x= interestGroup)) +
  geom_boxplot() +
  stat_summary(geom = "point",
               fun.y="mean",
               colour ="red",
               size=3) +
  facet_grid(haveKids ~.) +
  ggtitle("Joviality across Interest Groups by Kids Status")

Static Raincoud Plots

Joviality is defined as the characteristic of being cheerful and festive. A higher joviality index would represent a greater level of cheerfulness.

We will examine the joviality spread of the participants across the education levels in this data set. The code chunk below creates raincloud plots by using stat_halfeye() of ggdist and geom_boxplot() of ggplot2.

ggplot(participants, aes(x = educationLevel, y = joviality)) +
  scale_y_continuous(breaks = seq(0, 1, 0.2), 
                     limits = c(0, 1)) + 
  stat_halfeye(adjust = .33, 
               width = .67, 
               color = NA,
               justification = -.01,
               position = position_nudge(
                 x = .15)
  ) +
  geom_boxplot(
    width = .25,
    outlier.shape = NA
  ) +
  coord_flip() +
  ggtitle("Joviality Spread by Education Level")

Static Ridge Plot

We will examine the joviality spread of the participants across the interest groups in this data set. The code chunk below creates a ridge plot by using geom_density_ridges() of ggridges, an ggplot2 extension specially designed to create ridge plot.

ggplot(participants, 
       aes(x = joviality, y = interestGroup)) + 
  geom_density_ridges(rel_min_height = 0.01,
                      scale = 1) +
  ggtitle("Joviality Spread by Interest Group")

Composite Plot

We will combine multiple plots together to have a dashboard view. The code chunk below creates a composite plot by using the patchwork package.

Note: p1 - p5 are assigned to plots as shown earlier.

patchwork <- ((p1 / p2)| p3)/(p4 | p5) + 
              plot_annotation(tag_levels = 'I', title = 'Demographic of the city of Engagement, Ohio USA')
patchwork & theme_economist()