Take-home Exercise 2

In this Take-home Exercise, I will critic the submission from one of my classmates in terms of clarity and aesthetics, and also attempt to remake the visualisation design.

Author

Affiliation

Che Xuan

 

Published

April 30, 2022

DOI

Overview

In this take-home exercise, I will critic Ding Yanmu’s Take-home Exercise 1 submission in terms of clarity and aesthetics. I will also remake the original design by applying the data visualisation principles and best practice we had learnt in Lesson 1 and 2.

Getting Started

We will first install and launch the required R packages using the code chunk below.

packages = c('tidyverse', 'ggdist', 'ggridges', 'patchwork', 'ggthemes', 'ggrepel')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

We will also import the Participants.csv from the data folder into R using the code chunk below.

participants <- read_csv("data/Participants.csv")

glimpse(participants)
Rows: 1,011
Columns: 7
$ participantId  <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,~
$ householdSize  <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ~
$ haveKids       <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRU~
$ age            <dbl> 36, 25, 35, 21, 43, 32, 26, 27, 20, 35, 48, 2~
$ educationLevel <chr> "HighSchoolOrCollege", "HighSchoolOrCollege",~
$ interestGroup  <chr> "H", "B", "A", "I", "H", "D", "I", "A", "G", ~
$ joviality      <dbl> 0.001626703, 0.328086500, 0.393469590, 0.1380~

Critic 1

There is a typo with the title as shown in the picture above. This affects the overall aesthetics of the article.

Should be corrected as below:

2. Data Description

Critic 2

The orientation of the age_group label on the x-axis can be challenging to read. Moreover, the orientation of the y-axis label is also vertically but not horizontally displayed. This coincides with one of the data-ink principles on graph labeling where the orientation of label should be reader friendly.

We will group the data using the code chunk below.

participants$age_group <- cut(participants$age,
                                  breaks = c(-Inf,21, 26, 31, 36, 41, 46, 51, 56, Inf),
                                  labels = c("<20", "21-25", "26-30","31-35", "36-40", 
                                             "41-45", "46-50","51-55", "56-60"),
                                  right = FALSE)

The code chunk below plots a bar chart by using geom_bar() of ggplot2, with the use of coord_flip() to correct the orientation of the age_group label and one of the theme() components to rotate the y-axis label to horizontal position.

ggplot(data = participants,
       aes(x=age_group)) +
  geom_bar() +
  coord_flip() +
  geom_text(stat="count", aes(label=paste0(..count..)), check_overlap=TRUE, fontface="bold", 
            position=position_stack(vjust = 1.04)) +
  xlab("Age Group") +
  ylab("No. of Participants") +
  ggtitle("Age Distribution") +
  theme(axis.title.y= element_text(angle=0), axis.ticks.y= element_blank(),
        panel.background= element_blank(), axis.line= element_line(color= 'grey'))

Critic 3

Beside the issue with the orientation of the labels as mentioned earlier, the legend of the graph is also missing. In addition, the choice of using line graph to display the mean statistics may not very informative.

The code chunk below plots a trellis boxplot with the use of geom_boxplot() and facet_grid() of ggplot2. Point labels have been revised using stat_summary() to make them more appealing. The use of boxplot allows the comparison of mean and median statistics across each age group for those with or without kids.

ggplot(data=participants, 
       aes(y = joviality, x= age_group)) +
  geom_boxplot() +
  stat_summary(geom = "point",
               fun="mean",
               colour ="red",
               size=2) +
  stat_summary(aes(label = round(..y.., 2)), fun=mean, geom = "label_repel", size=3, angle=150) +
  facet_grid(haveKids ~.) +
  labs(y= 'Joviality', x= 'Age Group',
       title = "Distribution of Joviality across Age Groups by Kids Status") +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        axis.line= element_line(color= 'grey'))

Critic 4

Beside the issue with the orientation of the labels as mentioned earlier, the legend of the graph is also not properly placed. In addition, the choice of the line/dot size seems to have resulted to much overlaps.

The code chunk below plots a line chart with the use of stat_summary() of ggplot2. Legend has been properly placed to the right-hand side of the graph and line/dot size has been shrunk to avoid unnecessary overlaps. This is in accordance to the principle of showing the data clearly.

ggplot(data = participants,
       aes(x=age_group, y=joviality, color=educationLevel, group = educationLevel)) +
  stat_summary(fun.y=mean, geom = "point") +
  stat_summary(fun.y=mean, geom = "line") +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1)) +
  xlab("Age Group") +
  ylab("Joviality") +
  ggtitle("Joviality across Age Groups by Education Level") +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        panel.background= element_blank(), axis.line= element_line(color= 'grey'))

Critic 5

The graph title is missing and the bars are not sorted by their respective frequencies.

The code chunk below plots a bar chart by using geom_bar() of ggplot2, with the use of ggtitle() to add a graph title and bars are now sorted according to their respective frequencies.

ggplot(data = participants,
       aes(x=reorder(interestGroup, interestGroup, function(x)-length(x)))) +
  geom_bar(fill="skyblue") +
  ylim(0, 120) +
  geom_text(stat="count", 
      aes(label=paste0(..count.., " (", 
      round(..count../sum(..count..)*100,
            1), "%)")),
      vjust=-1, size=2.7) +
  xlab("Education Level") +
  ylab("No. of\nParticipants") +
  ggtitle("Education Level of Participants") +
  theme(axis.title.y= element_text(angle=0), axis.ticks.x= element_blank(),
        panel.background= element_blank(), axis.line= element_line(color= 'grey'))

Critic 6

The composite graph title is missing and the 3 charts have been combined in a manner that many parts have overlaps, making it hard for us to capture the information clearly from this dashboard.

The code chunk below creates a composite plot by using the patchwork package, with the use of plot_annotation() to add an overall graph title as well as subplots tagging.

Note: p1 - p3 are assigned to plots presented earlier.

patchwork <- ((p1 | p2)/ p3) + 
              plot_annotation(tag_levels = 'I', title = 'Demographic of the city of Engagement, Ohio USA')
patchwork & theme_economist()