Creating data visualisation beyond default
In this take home exercise, I will review take home exercise 1 of one of my classmates about the demographic of the city of Engagement, Ohio USA in terms of clarity and aesthetics and remake the original design by using the data visualisation principles and best practic.
Before we get started, it is important to ensure that the R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse','knitr','ggdist', 'ggridges',
'ggthemes', 'hrbrthemes',
'ggrepel', 'ggforce','viridis')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below import “Participants.csv” from the data folder
into R by using read_csv()
of readr
and save it as an tibble data frame called “participant_data”.
participant_data <- read_csv("data/Participants.csv")
I will review my
peer’s work. This exercise uses tidyverse and
ggplot2 to reveal the demographic of the city of
Engagement, Ohio USA.
The exercise explores the distribution of joviality using histogram chart. It also group the data using age so it can explore the distribution for participants with different age. The Joviality Measure figure in terms of education level and interest group is not clear to show the distribution of joviality and suggest no obvious pattern in terms of age. So we can see the distribution of joviality in terms of education level and interest group without grouping the data into different age levels.
I first see the distribution of joviality using a box plot.
ggplot(participant_data,aes(y=joviality)) +
geom_boxplot()
The pattern of joviality is still unclear with the box plot above.
The box plot can summarize the distribution of joviality but the problem
is that summarizing also means losing information. As a result, I would
like to show the data points on the box plots to make the graph more
insightful by adding jitter using
geom_point(position="jitter"). We can explore the joviality
distribution by education level.
ggplot(participant_data,aes(x=educationLevel,y=joviality)) +
geom_boxplot()+
geom_point(position="jitter",size = 0.5)
I will also use theme_ipsum() and add the title to make
the graph easier to read.
ggplot(participant_data,aes(x=educationLevel,y=joviality,fill=educationLevel)) +
geom_boxplot()+
geom_point(position="jitter",size = 0.5)+
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A boxplot of Joviality With Different Education Level") +
xlab("")
I will also use violin chart to interpret the distribution of
joviality using geom_violin().
ggplot(participant_data,aes(x=educationLevel,y=joviality,fill=educationLevel)) +
geom_violin()+
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Violin chart of Joviality With Different Education Level") +
xlab("")
We can do the similar analysis to different interest groups.
ggplot(participant_data,aes(x=interestGroup,y=joviality,fill=interestGroup)) +
geom_boxplot()+
scale_fill_viridis(discrete = TRUE, alpha=0.6) +
geom_jitter(color="black", size=0.4, alpha=0.9)+
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A boxplot of Joviality With Different Education Level") +
xlab("")
ggplot(participant_data,aes(x=interestGroup,y=joviality,fill=interestGroup)) +
geom_violin()+
scale_fill_viridis(discrete = TRUE, alpha=0.6, option="A") +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("Violin chart of Joviality With Different Interest Groups") +
xlab("")
According to the result from part 3, we can conclude that participants with bachelor or low education level are tend to be in the mode of either happy or unhappy and there is a possibility that more of them may feel unhappy. Participants whose interest group is C or E are more likely to feel happy that those in other interest group.