In this Take Home Exercise, I will explore the demographic of the city of Engagement, Ohio USA.
In this take home exercise, I will reveal the demographic of the city of Engagement, Ohio USA by using a patchwork of pie charts to visualize the distribution of participants and a Diverging stacked bar chart to visualize the joviality across different groups of participants.
Before we get started, it is important to ensure that the R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse',"HH","vcd",'knitr','ggdist', 'ggridges',
'patchwork', 'ggthemes', 'hrbrthemes',
'ggrepel', 'ggforce')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below import “Participants.csv” from the data folder
into R by using read_csv()
of readr
and save it as an tibble data frame called “participant_data”.
participant_data <- read_csv("data/Participants.csv")
We group the age data into 4 groups including “18-28”, “29-38”, “39-49” and “50-60”. Also, we group joviality data into 4 groups “0.2-0.4”, “0.4-0.6”, “0.6-0.8” and “0.8-1.0”.
participant_data["age_group"] = cut(participant_data$age, c(18,29,39,50,60),
c("18-28","29-38","39-49","50-60"), include.lowest=TRUE)
participant_data["joviality_group"] = cut(participant_data$joviality, c(0,0.2,0.4,0.6,0.8,1),
c("0-0.2", "0.2-0.4","0.4-0.6","0.6-0.8","0.8-1.0"), include.lowest=TRUE)
participant_data
# A tibble: 1,011 × 9
participantId householdSize haveKids age educationLevel
<dbl> <dbl> <lgl> <dbl> <chr>
1 0 3 TRUE 36 HighSchoolOrCollege
2 1 3 TRUE 25 HighSchoolOrCollege
3 2 3 TRUE 35 HighSchoolOrCollege
4 3 3 TRUE 21 HighSchoolOrCollege
5 4 3 TRUE 43 Bachelors
6 5 3 TRUE 32 HighSchoolOrCollege
7 6 3 TRUE 26 HighSchoolOrCollege
8 7 3 TRUE 27 Bachelors
9 8 3 TRUE 20 Bachelors
10 9 3 TRUE 35 Bachelors
# … with 1,001 more rows, and 4 more variables: interestGroup <chr>,
# joviality <dbl>, age_group <fct>, joviality_group <fct>
We calculate the number of population in different degree of joviality by different groups including householdSize, haveKids, age_group, educationLevel, and interestGroup.
t1=table(participant_data$householdSize,participant_data$joviality_group)
t2=table(participant_data$haveKids,participant_data$joviality_group)
t3=table(participant_data$age_group,participant_data$joviality_group)
t4=table(participant_data$educationLevel,participant_data$joviality_group)
t5=table(participant_data$interestGroup,participant_data$joviality_group)
participant=as.data.frame(rbind(t1,t2,t3,t4,t5))
participant <- cbind(Type = rownames(participant), participant)
rownames(participant) <- 1:nrow(participant)
participant["Subtable"] <- c("householdSize","householdSize","householdSize",
"haveKids","haveKids",
"age_group","age_group","age_group","age_group",
"educationLevel","educationLevel","educationLevel","educationLevel",
"interestGroup","interestGroup","interestGroup","interestGroup","interestGroup",
"interestGroup","interestGroup","interestGroup","interestGroup","interestGroup")
participant=participant %>%
bind_rows(summarise_all(., ~if(is.numeric(.)) sum(.) else "Total"))
The below participant table shows the distribution of participants of different groups.
kable(participant)
| Type | 0-0.2 | 0.2-0.4 | 0.4-0.6 | 0.6-0.8 | 0.8-1.0 | Subtable |
|---|---|---|---|---|---|---|
| 1 | 78 | 76 | 58 | 51 | 74 | householdSize |
| 2 | 70 | 83 | 70 | 81 | 69 | householdSize |
| 3 | 59 | 57 | 67 | 55 | 63 | householdSize |
| FALSE | 148 | 159 | 128 | 132 | 143 | haveKids |
| TRUE | 59 | 57 | 67 | 55 | 63 | haveKids |
| 18-28 | 52 | 47 | 54 | 61 | 59 | age_group |
| 29-38 | 48 | 55 | 40 | 39 | 62 | age_group |
| 39-49 | 58 | 53 | 58 | 47 | 46 | age_group |
| 50-60 | 49 | 61 | 43 | 40 | 39 | age_group |
| Bachelors | 46 | 57 | 33 | 44 | 52 | educationLevel |
| Graduate | 30 | 35 | 34 | 31 | 40 | educationLevel |
| HighSchoolOrCollege | 116 | 100 | 116 | 97 | 96 | educationLevel |
| Low | 15 | 24 | 12 | 15 | 18 | educationLevel |
| A | 21 | 26 | 17 | 21 | 17 | interestGroup |
| B | 18 | 22 | 14 | 22 | 15 | interestGroup |
| C | 22 | 19 | 14 | 18 | 29 | interestGroup |
| D | 21 | 22 | 15 | 22 | 16 | interestGroup |
| E | 18 | 15 | 13 | 13 | 24 | interestGroup |
| F | 17 | 26 | 25 | 17 | 21 | interestGroup |
| G | 19 | 22 | 23 | 18 | 26 | interestGroup |
| H | 27 | 25 | 22 | 22 | 15 | interestGroup |
| I | 20 | 17 | 25 | 12 | 22 | interestGroup |
| J | 24 | 22 | 27 | 22 | 21 | interestGroup |
| Total | 1035 | 1080 | 975 | 935 | 1030 | Total |
The below codes prepare the number of particpants of different groups for further visualization
t6=table(participant_data$householdSize)
t7=table(participant_data$haveKids)
t9=table(participant_data$educationLevel)
t10=table(participant_data$interestGroup)
t6=as.data.frame(t6)
t7=as.data.frame(t7)
t9=as.data.frame(t9)
We would like to see the education level distribution, household size distribution and havekids distribution of the participants.
We would like to see the education level distribution, household size distribution and havekids distribution of the participants using pie charts of ggplot(), gem_bar() and coord_polar().
The household size distribution is plotted as below.
ggplot(t6,aes(x="", y=Freq, fill=Var1))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0)
We use scale_fill_brewer() to change the color, ggtitle() to add the title, theme_ipsum() to change the features of the plot and amend the labs.
pie1=ggplot(t6,aes(x="", y=Freq, fill=Var1))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0)+
scale_fill_brewer(palette="Spectral")+
ggtitle("Household Size Distribution")+
labs(fill='Household Size')+
xlab("") +
ylab("Participants")+
theme_ipsum()
pie1
Similarly, we have the distribution of havekids and education level.
pie2=ggplot(t7,aes(x="", y=Freq, fill=Var1))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0)+
scale_fill_brewer(palette="Spectral")+
ggtitle("HaveKids Distribution")+
labs(fill='HaveKids')+
xlab("") +
ylab("Participants")+
theme_ipsum()
pie2
pie3=ggplot(t9,aes(x="", y=Freq, fill=Var1))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0)+
scale_fill_brewer(palette="Spectral")+
scale_fill_brewer(palette="Spectral")+
ggtitle("Education Level Distribution")+
labs(fill='Education Level')+
xlab("") +
ylab("Participants")+
theme_ipsum()
pie3
We use patchwork to combine the three pie charts. The patch work finally gives a overview of distribution of paricipants in terms of education level, household size and have kids or not.
patchwork <- (pie1 /pie2 /pie3)
patchwork
We will use diverging
stacked bar chart to show the joviality level of different groups of
participants.
The code chunk below plot the diverging stacked bar chart by using likert to show the joviality level of different groups of participants.
likert(Type ~ . | Subtable, data = participant, as.percent = TRUE, layout = c(1, 6))
The plot above is unreadable, we want to scale the plot to visualize better by add scale feature to likert.
likert(Type ~ . | Subtable, data = participant, as.percent = TRUE,
scales = list(y = list(relation = "free")), layout = c(1, 6))
We want to sort the bars to make it easier to interpret so we add the order.
dsbc=likert(Type ~ . | Subtable, data = participant, as.percent = TRUE, positive.order = TRUE,
scales = list(y = list(relation = "free")), layout = c(1, 6))
dsbc
patchwork
From the pie charts, we can find that around only 30% of participants have kids. The education level of more than half of the participants is highschool or college or below.
dsbc
The diverging stacked bar chart shows that participants with only one household size may have lower joviality than those with 2 or 3. Participants with kids also have a higher level of joviality. Young people are more likely to be jovial than the older. Also, higher education level may come with higher level of joviality. Participants with Interest Group of C,E and G have higher level of joviality than those with Interest Group H.