Take Home Exercise 2

Creating data visualisation beyond default

JUNQIU NI (MITB)
2022-05-01

1. Overview

In this take home exercise, I will review take home exercise 1 of one of my classmates about the demographic of the city of Engagement, Ohio USA in terms of clarity and aesthetics and remake the original design by using the data visualisation principles and best practic.

2. Getting Started

2.1 Loading R packages

Before we get started, it is important to ensure that the R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse','knitr','ggdist', 'ggridges',
             'ggthemes', 'hrbrthemes',
             'ggrepel', 'ggforce','viridis')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2. Importing Data

The code chunk below import “Participants.csv” from the data folder into R by using read_csv() of readr and save it as an tibble data frame called “participant_data”.

participant_data <- read_csv("data/Participants.csv")

3. Review and Remake of Peer Work

I will review my peer’s work. This exercise uses tidyverse and ggplot2 to reveal the demographic of the city of Engagement, Ohio USA.

3.1 Review of Peer’s Work

The exercise explores the distribution of joviality using histogram chart. It also group the data using age so it can explore the distribution for participants with different age. The Joviality Measure figure in terms of education level and interest group is not clear to show the distribution of joviality and suggest no obvious pattern in terms of age. So we can see the distribution of joviality in terms of education level and interest group without grouping the data into different age levels.

3.2 A Boxplot of joviality

I first see the distribution of joviality using a box plot.

ggplot(participant_data,aes(y=joviality)) +
    geom_boxplot()

3.3 Boxplot of Joviality With Different Education Level

The pattern of joviality is still unclear with the box plot above. The box plot can summarize the distribution of joviality but the problem is that summarizing also means losing information. As a result, I would like to show the data points on the box plots to make the graph more insightful by adding jitter using geom_point(position="jitter"). We can explore the joviality distribution by education level.

ggplot(participant_data,aes(x=educationLevel,y=joviality)) +
    geom_boxplot()+
    geom_point(position="jitter",size = 0.5)

I will also use theme_ipsum() and add the title to make the graph easier to read.

ggplot(participant_data,aes(x=educationLevel,y=joviality,fill=educationLevel)) +
    geom_boxplot()+
    geom_point(position="jitter",size = 0.5)+
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("A boxplot of Joviality With Different Education Level") +
    xlab("")

3.4 Violin Chart of Joviality With Different Education Level

I will also use violin chart to interpret the distribution of joviality using geom_violin().

ggplot(participant_data,aes(x=educationLevel,y=joviality,fill=educationLevel)) +
    geom_violin()+
    scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Violin chart of Joviality With Different Education Level") +
    xlab("")

3.5 Box Plot of Joviality With Different Interest Groups

We can do the similar analysis to different interest groups.

ggplot(participant_data,aes(x=interestGroup,y=joviality,fill=interestGroup)) +
    geom_boxplot()+
    scale_fill_viridis(discrete = TRUE, alpha=0.6) +
    geom_jitter(color="black", size=0.4, alpha=0.9)+
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("A boxplot of Joviality With Different Education Level") +
    xlab("")

3.5 Violin Plot of Joviality With Different Interest Groups

ggplot(participant_data,aes(x=interestGroup,y=joviality,fill=interestGroup)) +
    geom_violin()+
    scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=11)
    ) +
    ggtitle("Violin chart of Joviality With Different Interest Groups") +
    xlab("")

4. Analysis and Insight

According to the result from part 3, we can conclude that participants with bachelor or low education level are tend to be in the mode of either happy or unhappy and there is a possibility that more of them may feel unhappy. Participants whose interest group is C or E are more likely to feel happy that those in other interest group.