Putting Visual Analytics into Practical Use
In this take home exercise, I will focus on second question of Challenge 3 from VAST Challenge 2022 about the financial health of the residents change over the period of the city of Engagement, Ohio USA. I’ll try to visualize the wages compare to the overall cost of living in Engagement and groups that appear to exhibit similar patterns.
Before we get started, it is important to ensure that the R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse','lubridate','ggplot2','ggthemes','ggdist', 'ggridges',
'patchwork', 'hrbrthemes','plotly','ggiraph',
'ggrepel', 'ggforce','zoo','trelliscopejs')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below import “Participants.csv” and
“FinancialJournal.csv” from the data folder into R by using read_csv()
of readr
and save them as an tibble data frame called “participant_data” and
.financial_data.
Also, for the timestamp variable in “FinancialJournal.csv”, we use
the [zoo] package to change it into the format of year and
month.
participant_data <- read_csv("data/Participants.csv")
financial_data <- read_csv("data/FinancialJournal.csv") %>%
mutate(yearmonth = as.yearmon(timestamp))
The code chunk below shows how to subset the financial data with only wage category.
wage=financial_data%>% filter(category == "Wage")
The code chunk below shows another way to create the variable of year and month using [‘lubridate’] package.
ggplot(data=wage,
aes(x = amount, colour = year_month)) +
geom_density()+
theme_ipsum()
ggplot(bymonth_id, aes(x = amount, y = year_month)) +
geom_density_ridges(scale = 1) +
theme_ridges()
ggplot(by_id, aes(x = amount, y = year_month, fill=category)) +
geom_density_ridges(scale = 1) +
theme_ridges()
To compare the wage and cost, We can also add an interactive label to
the graph showing the percentage of each category of cost over the wage
of a participant. However, girafe package does not work well with
geom_density_ridges function. Further work needs to be done
to improve this problem.
The below code shows how to visualize the density graph of wage over
the period by particiapntId using trellis.
ggplot(data=wage, aes(x = amount, colour = year_month)) + geom_density()+ facet_trelliscope(~ participantId, nrow = 2, ncol = 5, width = 450, path=“trellis/”, self_contained = TRUE) + theme_bw()
There is still a problem to show it in R markdown because the trellis file is too big, the output file can be seen in the attached folder of assignment submission.
The below code shows how to visualize the overall wage and cost over
the period with a stacked bar chart using
geom_bar_interactive and girafe. Howerver, the
code has a problem to show on R markdown. The problem also needs to be
solved later.
p=financial_data %>% ggplot(aes(x=year_month, y=amount,fill=category)) + geom_bar_interactive(position=“stack”, stat=“identity”) + ggtitle(“Wage and Cost”) + theme_ipsum()
girafe(
ggobj = p,
)