#TidyTuesday Behind The Scenes: Matrix Plot

by Tanya Shapiro

August 16th, 2022

About

For this TidyTuesday I created a simple matrix plot with ggplot2 and a few extension libraries. This plot is quick and easy to create, in this tutorial I’ll walk us through the behind the scenes.

R Libraries

Let’s go ahead and import our libraries. For this graphic, I used the following packages. When it comes to data wrangling/reshaping, I mostly default to dplyr (part of tidyverse).

The graphic we’ll create usese ggplot2, also a part of the tidyverse, along with additional ggplot extension libraries such as ggimage and ggtext.

Lastly, we’ll use sysfonts and showtext to add in some custom fonts - no need to download from online, sysfonts lets us use Google’s font library.

#for data wrangling (dplyr) & graphing (ggplot2)
library(tidyverse)
#for plotting
library(ggimage)
library(ggtext)
#to bring in the data
library(tidytuesdayR)
#for fonts
library(sysfonts)
library(showtext)
#to preview data as tables
library(kableExtra)

Importing Data

There’s a couple of ways we can import our TidyTuesday data set. This week, the data includes 3 different data sets. For our tutorial, we’ll only use two of the three - characters and psych_stats - to produce our visual.

The R for Data Science team provides a link to the data set or we can use tidytuesdayR package with tt_load to download the data. Since the files are large and I don’t want to abuse the API rate limit, I’ll opt to read them manually with read_csv.

#alternative option to import with tidytuesdayR
#data <- tidytuesdayR::tt_load(2022, week = 33)
#characters<-data$characters
#ps<-data$psych_stats

#import data with read_csv
characters<-read.csv("https://raw.githubusercontent.com/tashapiro/open-psychometrics/main/data/characters.csv")
ps<-read.csv('https://raw.githubusercontent.com/tashapiro/open-psychometrics/main/data/psych_stats.csv')

#preview characters dataframe, first 5 records
kable(head(characters,5))%>%kable_styling(latex_options = "HOLD_position")

id	name	uni_id	uni_name	notability	link	image_link
F2	Monica Geller	F	Friends	79.7	https://openpsychometrics.org/tests/characters/stats/F/2	https://openpsychometrics.org/tests/characters/test-resources/pics/F/2.jpg
F1	Rachel Green	F	Friends	76.7	https://openpsychometrics.org/tests/characters/stats/F/1	https://openpsychometrics.org/tests/characters/test-resources/pics/F/1.jpg
F5	Chandler Bing	F	Friends	74.4	https://openpsychometrics.org/tests/characters/stats/F/5	https://openpsychometrics.org/tests/characters/test-resources/pics/F/5.jpg
F4	Joey Tribbiani	F	Friends	74.3	https://openpsychometrics.org/tests/characters/stats/F/4	https://openpsychometrics.org/tests/characters/test-resources/pics/F/4.jpg
F3	Phoebe Buffay	F	Friends	72.6	https://openpsychometrics.org/tests/characters/stats/F/3	https://openpsychometrics.org/tests/characters/test-resources/pics/F/3.jpg

#preview psyc_stats dataframe, first 5 records
kable(head(ps,5))%>%kable_styling(latex_options = "HOLD_position")

char_id	char_name	uni_id	uni_name	question	personality	avg_rating	rank	rating_sd	number_ratings
F2	Monica Geller	F	Friends	messy/neat	neat	95.7	9	11.7	1079
F2	Monica Geller	F	Friends	disorganized/self-disciplined	self-disciplined	95.2	27	11.2	1185
F2	Monica Geller	F	Friends	diligent/lazy	diligent	93.9	87	10.4	1166
F2	Monica Geller	F	Friends	on-time/tardy	on-time	93.8	34	14.3	236
F2	Monica Geller	F	Friends	competitive/cooperative	competitive	93.6	56	13.4	1168

Reshaping & Cleaning Data

One of the most important steps in creating a data visualization is UNDERSTANDING the data you’re working with. I easily spend 10-15 minutes (if not more) combing through the data. Are there missing values? Are values standardized? Is there a data dictionary I can reference to make sense of different fields?

The characters data set provides contains a row per character with the character name, universe name, and related links. The psych_stats data set has a many-to-one relationship with characters: each record represents a personality item for a character (and there are~ 400 items per character).

Digging into the personality evaluations, avg_rating in psych_stats never exceeds 50 and relates to a personality extreme (e.g. neat/messy a character is either neat or messy and the avg_rating is >=50). This makes it tricky for comparison against other characters, let’s try to clean this up with dplyr.

sc<-ps%>%
  #filter to just see characters from Schit's Creek. use two personality items
  filter(uni_name=="Schitt's Creek" 
         & question %in% c("genuine/sarcastic","cynical/gullible"))%>%
  #grab the last half of the question with sub & regex 
  mutate(anchor = sub("^(.+?)\\/","",question))%>%
  #select let's us subset our columns - let's grab the ones we need for plotting
  select(char_id, char_name, question, personality, anchor, avg_rating)

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")

char_id	char_name	question	personality	anchor	avg_rating
SsC3	David Rose	genuine/sarcastic	sarcastic	sarcastic	86.2
SsC3	David Rose	cynical/gullible	cynical	gullible	68.6
SsC4	Alexis Rose	cynical/gullible	gullible	gullible	72.4
SsC4	Alexis Rose	genuine/sarcastic	sarcastic	sarcastic	57.8
SsC1	Johnny Rose	genuine/sarcastic	genuine	sarcastic	57.4

We used dplyr::mutate (with the help of some regex) to created our new anchor field. This represents one of the personality extreme. We’ll use this anchor field to rescale our avg_rating.

If the character’s personality doesn’t match the anchor, we’ll change our new rating, rescaled, to 100 - avg_rating (e.g. if someone is 60 genuine, they’re now 40 sarcastic). We can use case_when to set up our new if/then rules.

sc<-sc%>%
  mutate(rescaled = case_when(anchor!=personality~ 100-avg_rating, 
                              TRUE ~ avg_rating))

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")

char_id	char_name	question	personality	anchor	avg_rating	rescaled
SsC3	David Rose	genuine/sarcastic	sarcastic	sarcastic	86.2	86.2
SsC3	David Rose	cynical/gullible	cynical	gullible	68.6	31.4
SsC4	Alexis Rose	cynical/gullible	gullible	gullible	72.4	72.4
SsC4	Alexis Rose	genuine/sarcastic	sarcastic	sarcastic	57.8	57.8
SsC1	Johnny Rose	genuine/sarcastic	genuine	sarcastic	57.4	42.6

The penultimate step in our data reshaping process: we need to convert the data from a long format to a wide format. Currently, each record represents a personality trait per character. The end goal - we want one record per character with their respective scores per personality item. This is a perfect use case for dplyr::pivot_wider! It’s almost like dplyr has something for every scenario…

sc<-sc%>%
  #subset data again, we can get rid of avg_rating and question
   select(char_id, char_name, anchor, rescaled)%>%
  #reshape data - we want to use this for a matrix plot with x & y points for
  pivot_wider(names_from=anchor, values_from=rescaled)

kable(head(sc,5))%>%kable_styling(latex_options = "HOLD_position")

char_id	char_name	sarcastic	gullible
SsC3	David Rose	86.2	31.4
SsC4	Alexis Rose	57.8	72.4
SsC1	Johnny Rose	42.6	45.6
SsC2	Moira Rose	70.2	30.0
SsC5	Stevie Budd	90.1	14.7

And as a quick finisher, we’ll also include the image links for each character. The character data set has an image field. We can use join to combine these data sets together.

sc<-sc%>%left_join(characters%>%select(id, image_link), by=c("char_id"="id"))

The Fun Part, Plotting!

Base Plot

Let’s see what our initial plot looks like with our freshly reshaped data.

ggplot(data=sc, mapping=aes(x=sarcastic, y=gullible))+
  geom_text(aes(label=char_name))

Font Set Up

Before we start de novo, let’s take a minute to reset our fonts. Using different fonts is such an easy way to elevate the aesthetic of your plot. I love using sysfonts because I can call in different google fonts without downloading them. To make sure they render properly in our plot, we’ll also use showtext_auto().

#import fonts from sysfont package
sysfonts::font_add_google("roboto")
sysfonts::font_add_google("DM Serif Display","dm")
showtext_auto()

Starting from Scratch

Not much to look at, but we can start seeing our matrix forming. We’ll start from scratch and rebuild with some new elements.

Let’s start by drawing our matrix lines first with geom_segment. Instead of character names, we’ll introduce our friend ggimage:geom_image to plot their pictures using the image_link field.

We will also re-add the character names back in with geom_label. Since we don’t want to plot the name over the picture, we’ll modify the y coordinate so it fits slightly beneath the image.

Finally, we’re going to ditch ggplot’s default theme. To completely clear it out, I like using theme_void. Let’s add in our own theme in this step too to give it our dark mode vibe! We can do this by specifying the fill color for plot.background within element_rect.

plot<-ggplot(data=sc, mapping=aes(x=sarcastic, y=gullible))+
  #lines for matrix, use arrow() field to draw arrows at both ends
  geom_segment(mapping=aes(x=0, xend=100, y=50, yend=50), 
               arrow=arrow(lengt=unit(0.1,"inches"), ends="both"), 
               color="#FFED47")+
  geom_segment(mapping=aes(y=0, yend=100, x=50, xend=50), 
               arrow=arrow(length=unit(0.1,"inches"),ends="both"), 
               color="#FFED47")+
  #use geom_label instead to plot
  geom_image(aes(image=image_link), size=0.07)+
  #add character label beneath image, adjust by subtracting a little from y value
  geom_label(aes(label=char_name, y=gullible-7.5), 
             fill="black", color="white", size=3.5)+
  #clear out theme
  theme_void()+
  theme(plot.background = element_rect(fill="black", color=NA))

plot

Custom Axis Labels

Since we ditched our axis text with theme_void, we need to add some text back in to guide our audience. Let’s add in new labels (e.g. SARCASTIC, GENUINE) on the respective ends of the arrows with geom_text. We can use the angle argument to rotate the labels.

plot<-plot+
    geom_text(mapping=aes(label="GENUINE",x=-10, y=50), 
              angle=90, size=5, color="white")+
    geom_text(mapping=aes(label="SARCASTIC",x=110, y=50),
              angle=-90, size=5, color="white")+
    geom_text(mapping=aes(label="GULLIBLE",x=50, y=110),  
              size=5, color="white")+
    geom_text(mapping=aes(label="CYNICAL",x=50, y=-10), 
              size=5)

plot

Adding in Title

I have a new growing obsession with ggtext. It gives ggplotters a whole new level of flexibility when it comes to adding labels to our plot. To get really fancy, it helps to know some basic HTML and CSS. Here’s how I created the title text:

title = "<span style='font-size:24pt;color:white;font-family:dm;'>**Schitt's**</span><span style='font-size:24pt;color:#FFED47;font-family:dm'> **Creek**</span><br>
<span style='font-size:11pt;color:white;font-family:roboto;'>Character Personality Matrix. Data from the Open-Source Psychometrics Project.</span>"

caption= "<span style='color:white;'>Graphic by </span><span style='color:#FFED47;'>@tanya_shapiro</span>"

I agree, the code is not pretty to look at for this part. But take a look at what happens when we add it in our new labels with ggplot::labs and tweak our plot title with ggtext::element_textbox_simple!

  plot+
  labs(title=title, caption=caption)+
  #adjust theme
  theme(plot.title=element_textbox_simple(halign =0.5),
        plot.caption=element_textbox_simple(halign=0.95),
        plot.margin = margin(rep(20,4)))

That’s a Wrap!

That concludes the behind the scenes for my TidyTuesday plot this week. If you have any questions, please feel free to shoot me a Tweet @tanya_shapiro. Thank you!