Take Home Exercise 3

Putting Visual Analytics into Practical Use

JUNQIU NI
2022-05-15

1. Overview

In this take home exercise, I will focus on second question of Challenge 3 from VAST Challenge 2022 about the financial health of the residents change over the period of the city of Engagement, Ohio USA. I’ll try to visualize the wages compare to the overall cost of living in Engagement and groups that appear to exhibit similar patterns.

2. Getting Started

2.1 Loading R packages

Before we get started, it is important to ensure that the R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The chunk code below will do the trick.

packages = c('tidyverse','lubridate','ggplot2','ggthemes','ggdist', 'ggridges',
             'patchwork', 'hrbrthemes','plotly','ggiraph',
             'ggrepel', 'ggforce','zoo','trelliscopejs')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2. Importing Data and Data Preprocessing

The code chunk below import “Participants.csv” and “FinancialJournal.csv” from the data folder into R by using read_csv() of readr and save them as an tibble data frame called “participant_data” and .financial_data.

Also, for the timestamp variable in “FinancialJournal.csv”, we use the [zoo] package to change it into the format of year and month.

participant_data <- read_csv("data/Participants.csv")
financial_data <- read_csv("data/FinancialJournal.csv") %>%
  mutate(yearmonth = as.yearmon(timestamp))

The code chunk below shows how to subset the financial data with only wage category.

wage=financial_data%>% filter(category == "Wage")

The code chunk below shows another way to create the variable of year and month using [‘lubridate’] package.

# Create year-month column
#financial_data$year_month <- floor_date(financial_data$timestamp,"month")
financial_data$year_month <- format(as.Date(financial_data$timestamp), "%Y-%m")
# Create year-month column
#financial_data$year_month <- floor_date(financial_data$timestamp,"month")
wage$year_month <- format(as.Date(wage$timestamp), "%Y-%m")
bymonth_id <- aggregate(cbind(amount)~year_month+participantId,
             data=wage,FUN=sum)
by_id <- aggregate(cbind(amount)~year_month+participantId+category,
             data=financial_data,FUN=sum)

A Density Plot showing the wage distribution

ggplot(data=wage, 
       aes(x = amount, colour = year_month)) +
  geom_density()+ 
  theme_ipsum()

A Ridge Plot showing the wage distribution over the period

ggplot(bymonth_id, aes(x = amount, y = year_month)) +
  geom_density_ridges(scale = 1) + 
  theme_ridges()

A Ridge Plot comparing the wage and cost distribution over the period

ggplot(by_id, aes(x = amount, y = year_month, fill=category)) +
  geom_density_ridges(scale = 1) + 
  theme_ridges()

To compare the wage and cost, We can also add an interactive label to the graph showing the percentage of each category of cost over the wage of a participant. However, girafe package does not work well with geom_density_ridges function. Further work needs to be done to improve this problem.

Interactive density graphs using trellis

The below code shows how to visualize the density graph of wage over the period by particiapntId using trellis.

ggplot(data=wage, aes(x = amount, colour = year_month)) + geom_density()+ facet_trelliscope(~ participantId, nrow = 2, ncol = 5, width = 450, path=“trellis/”, self_contained = TRUE) + theme_bw()

There is still a problem to show it in R markdown because the trellis file is too big, the output file can be seen in the attached folder of assignment submission.

Interactive bar chart comparing the wage and cost over the period

The below code shows how to visualize the overall wage and cost over the period with a stacked bar chart using geom_bar_interactive and girafe. Howerver, the code has a problem to show on R markdown. The problem also needs to be solved later.

p=financial_data %>% ggplot(aes(x=year_month, y=amount,fill=category)) + geom_bar_interactive(position=“stack”, stat=“identity”) + ggtitle(“Wage and Cost”) + theme_ipsum()

girafe(
ggobj = p,
)