#TidyTuesday Behind The Scenes: Matrix Plot

#TidyTuesday Behind The Scenes: Matrix Plot

#TidyTuesday Behind The Scenes: Matrix Plot

About

For this TidyTuesday I created a simple matrix plot with ggplot2 and a few extension libraries. This plot is quick and easy to create, in this tutorial I’ll walk us through the behind the scenes.

R Libraries

Let’s go ahead and import our libraries. For this graphic, I used the following packages. When it comes to data wrangling/reshaping, I mostly default to dplyr (part of tidyverse).

The graphic we’ll create usese ggplot2, also a part of the tidyverse, along with additional ggplot extension libraries such as ggimage and ggtext.

Lastly, we’ll use sysfonts and showtext to add in some custom fonts - no need to download from online, sysfonts lets us use Google’s font library.

#for data wrangling (dplyr) & graphing (ggplot2)
library(tidyverse)
#for plotting
library(ggimage)
library(ggtext)
#to bring in the data
library(tidytuesdayR)
#for fonts
library(sysfonts)
library(showtext)
#to preview data as tables
library(kableExtra)

Importing Data

There’s a couple of ways we can import our TidyTuesday data set. This week, the data includes 3 different data sets. For our tutorial, we’ll only use two of the three - characters and psych_stats - to produce our visual.

The R for Data Science team provides a link to the data set or we can use tidytuesdayR package with tt_load to download the data. Since the files are large and I don’t want to abuse the API rate limit, I’ll opt to read them manually with read_csv.

#alternative option to import with tidytuesdayR
#data <- tidytuesdayR::tt_load(2022, week = 33)
#characters<-data$characters
#ps<-data$psych_stats

#import data with read_csv
characters<-read.csv("https://raw.githubusercontent.com/tashapiro/open-psychometrics/main/data/characters.csv")
ps<-read.csv('https://raw.githubusercontent.com/tashapiro/open-psychometrics/main/data/psych_stats.csv')

#preview characters dataframe, first 5 records
kable(head(characters,5))%>%kable_styling(latex_options = "HOLD_position")
id name uni_id uni_name notability link image_link
F2 Monica Geller F Friends 79.7 https://openpsychometrics.org/tests/characters/stats/F/2 https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg
F1 Rachel Green F Friends 76.7 https://openpsychometrics.org/tests/characters/stats/F/1 https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg
F5 Chandler Bing F Friends 74.4 https://openpsychometrics.org/tests/characters/stats/F/5 https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg
F4 Joey Tribbiani F Friends 74.3 https://openpsychometrics.org/tests/characters/stats/F/4 https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg
F3 Phoebe Buffay F Friends 72.6 https://openpsychometrics.org/tests/characters/stats/F/3 https://openpsychometrics.org/tests/characters/test-resources/pics/F/3.jpg
#preview psyc_stats dataframe, first 5 records
kable(head(ps,5))%>%kable_styling(latex_options = "HOLD_position")
char_id char_name uni_id uni_name question personality avg_rating rank rating_sd number_ratings
F2 Monica Geller F Friends messy/neat neat 95.7 9 11.7 1079
F2 Monica Geller F Friends disorganized/self-disciplined self-disciplined 95.2 27 11.2 1185
F2 Monica Geller F Friends diligent/lazy diligent 93.9 87 10.4 1166
F2 Monica Geller F Friends on-time/tardy on-time 93.8 34 14.3 236
F2 Monica Geller F Friends competitive/cooperative competitive 93.6 56 13.4 1168

Reshaping & Cleaning Data

One of the most important steps in creating a data visualization is UNDERSTANDING the data you’re working with. I easily spend 10-15 minutes (if not more) combing through the data. Are there missing values? Are values standardized? Is there a data dictionary I can reference to make sense of different fields?

The characters data set provides contains a row per character with the character name, universe name, and related links. The psych_stats data set has a many-to-one relationship with characters: each record represents a personality item for a character (and there are~ 400 items per character).

Digging into the personality evaluations, avg_rating in psych_stats never exceeds 50 and relates to a personality extreme (e.g. neat/messy a character is either neat or messy and the avg_rating is >=50). This makes it tricky for comparison against other characters, let’s try to clean this up with dplyr.

sc<-ps%>%
  #filter to just see characters from Schit's Creek. use two personality items
  filter(uni_name=="Schitt's Creek" 
         & question %in% c("genuine/sarcastic","cynical/gullible"))%>%
  #grab the last half of the question with sub & regex 
  mutate(anchor = sub("^(.+?)\\/","",question))%>%
  #select let's us subset our columns - let's grab the ones we need for plotting
  select(char_id, char_name, question, personality, anchor, avg_rating)

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")
char_id char_name question personality anchor avg_rating
SsC3 David Rose genuine/sarcastic sarcastic sarcastic 86.2
SsC3 David Rose cynical/gullible cynical gullible 68.6
SsC4 Alexis Rose cynical/gullible gullible gullible 72.4
SsC4 Alexis Rose genuine/sarcastic sarcastic sarcastic 57.8
SsC1 Johnny Rose genuine/sarcastic genuine sarcastic 57.4

We used dplyr::mutate (with the help of some regex) to created our new anchor field. This represents one of the personality extreme. We’ll use this anchor field to rescale our avg_rating.

If the character’s personality doesn’t match the anchor, we’ll change our new rating, rescaled, to 100 - avg_rating (e.g. if someone is 60 genuine, they’re now 40 sarcastic). We can use case_when to set up our new if/then rules.

sc<-sc%>%
  mutate(rescaled = case_when(anchor!=personality~ 100-avg_rating, 
                              TRUE ~ avg_rating))

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")
char_id char_name question personality anchor avg_rating rescaled
SsC3 David Rose genuine/sarcastic sarcastic sarcastic 86.2 86.2
SsC3 David Rose cynical/gullible cynical gullible 68.6 31.4
SsC4 Alexis Rose cynical/gullible gullible gullible 72.4 72.4
SsC4 Alexis Rose genuine/sarcastic sarcastic sarcastic 57.8 57.8
SsC1 Johnny Rose genuine/sarcastic genuine sarcastic 57.4 42.6

The penultimate step in our data reshaping process: we need to convert the data from a long format to a wide format. Currently, each record represents a personality trait per character. The end goal - we want one record per character with their respective scores per personality item. This is a perfect use case for dplyr::pivot_wider! It’s almost like dplyr has something for every scenario…

sc<-sc%>%
  #subset data again, we can get rid of avg_rating and question
   select(char_id, char_name, anchor, rescaled)%>%
  #reshape data - we want to use this for a matrix plot with x & y points for
  pivot_wider(names_from=anchor, values_from=rescaled)

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")
char_id char_name sarcastic gullible
SsC3 David Rose 86.2 31.4
SsC4 Alexis Rose 57.8 72.4
SsC1 Johnny Rose 42.6 45.6
SsC2 Moira Rose 70.2 30.0
SsC5 Stevie Budd 90.1 14.7

And as a quick finisher, we’ll also include the image links for each character. The character data set has an image field. We can use join to combine these data sets together.

sc<-sc%>%left_join(characters%>%select(id, image_link), by=c("char_id"="id"))

The Fun Part, Plotting!

Base Plot

Let’s see what our initial plot looks like with our freshly reshaped data.

ggplot(data=sc, mapping=aes(x=sarcastic, y=gullible))+
  geom_text(aes(label=char_name))

Font Set Up

Before we start de novo, let’s take a minute to reset our fonts. Using different fonts is such an easy way to elevate the aesthetic of your plot. I love using sysfonts because I can call in different google fonts without downloading them. To make sure they render properly in our plot, we’ll also use showtext_auto().

#import fonts from sysfont package
sysfonts::font_add_google("roboto")
sysfonts::font_add_google("DM Serif Display","dm")
showtext_auto()

Starting from Scratch

Not much to look at, but we can start seeing our matrix forming. We’ll start from scratch and rebuild with some new elements.

Let’s start by drawing our matrix lines first with geom_segment. Instead of character names, we’ll introduce our friend ggimage:geom_image to plot their pictures using the image_link field.

We will also re-add the character names back in with geom_label. Since we don’t want to plot the name over the picture, we’ll modify the y coordinate so it fits slightly beneath the image.

Finally, we’re going to ditch ggplot’s default theme. To completely clear it out, I like using theme_void. Let’s add in our own theme in this step too to give it our dark mode vibe! We can do this by specifying the fill color for plot.background within element_rect.

plot<-ggplot(data=sc, mapping=aes(x=sarcastic, y=gullible))+
  #lines for matrix, use arrow() field to draw arrows at both ends
  geom_segment(mapping=aes(x=0, xend=100, y=50, yend=50), 
               arrow=arrow(lengt=unit(0.1,"inches"), ends="both"), 
               color="#FFED47")+
  geom_segment(mapping=aes(y=0, yend=100, x=50, xend=50), 
               arrow=arrow(length=unit(0.1,"inches"),ends="both"), 
               color="#FFED47")+
  #use geom_label instead to plot
  geom_image(aes(image=image_link), size=0.07)+
  #add character label beneath image, adjust by subtracting a little from y value
  geom_label(aes(label=char_name, y=gullible-7.5), 
             fill="black", color="white", size=3.5)+
  #clear out theme
  theme_void()+
  theme(plot.background = element_rect(fill="black", color=NA))

plot

Custom Axis Labels

Since we ditched our axis text with theme_void, we need to add some text back in to guide our audience. Let’s add in new labels (e.g. SARCASTIC, GENUINE) on the respective ends of the arrows with geom_text. We can use the angle argument to rotate the labels.

plot<-plot+
    geom_text(mapping=aes(label="GENUINE",x=-10, y=50), 
              angle=90, size=5, color="white")+
    geom_text(mapping=aes(label="SARCASTIC",x=110, y=50),
              angle=-90, size=5, color="white")+
    geom_text(mapping=aes(label="GULLIBLE",x=50, y=110),  
              size=5, color="white")+
    geom_text(mapping=aes(label="CYNICAL",x=50, y=-10), 
              size=5)

plot

Adding in Title

I have a new growing obsession with ggtext. It gives ggplotters a whole new level of flexibility when it comes to adding labels to our plot. To get really fancy, it helps to know some basic HTML and CSS. Here’s how I created the title text:

title = "<span style='font-size:24pt;color:white;font-family:dm;'>**Schitt's**</span><span style='font-size:24pt;color:#FFED47;font-family:dm'> **Creek**</span><br>
<span style='font-size:11pt;color:white;font-family:roboto;'>Character Personality Matrix. Data from the Open-Source Psychometrics Project.</span>"

caption= "<span style='color:white;'>Graphic by </span><span style='color:#FFED47;'>@tanya_shapiro</span>"

I agree, the code is not pretty to look at for this part. But take a look at what happens when we add it in our new labels with ggplot::labs and tweak our plot title with ggtext::element_textbox_simple!

  plot+
  labs(title=title, caption=caption)+
  #adjust theme
  theme(plot.title=element_textbox_simple(halign =0.5),
        plot.caption=element_textbox_simple(halign=0.95),
        plot.margin = margin(rep(20,4)))

That’s a Wrap!

That concludes the behind the scenes for my TidyTuesday plot this week. If you have any questions, please feel free to shoot me a Tweet @tanya_shapiro. Thank you!

Previous
Previous

Drilldowns with Highcharter

Next
Next

Creating “Super” Radar Plots with ggplot2