The extent of each line represents the degree to which that character is more likely to use that word compared to other characters.
Notice how Cap talks about people (especially Tony). T’Challa (Black Panther)’s speech is marked by noble topics, opposite of Spiderman, who bumbles around like the teenager that he is. Hulk (Bruce Banner) and Clint (Hawkeye) both are notable for referring to Nat (Black Widow), although for different reasons. Vision and Scarlet Witch talk about very similar themes, which might explain why they seem to gravitate toward each other. Thor’s got his mind set on the bigger picture, leading directly into the events to come in Infinity War. Loki, Unsurprisingly, is the character most likely to talk about power. Ultron wants power in an entirely different way, and is more poetic.
All of these patterns were identified by Elle O’Brien, who uses neural networks to generate predictive text for Botnik Studios. The visualization project was initiated during a meetup of Data Viz Jam Sessions, hosted by Nancy Organ.

Here are the R packages that we will use:
library(dplyr)
library(grid)
library(gridExtra)
library(ggplot2)
library(reshape2)
library(cowplot)
library(jpeg)
library(extrafont)
Some people say that it’s bad form to use this “clear everything” line. I do it routinely at the top of a script to make sure that when I run it, it doesn’t depend on any objects that I accidentally left in the workspace.
rm(list = ls())
dir_images <- "C:\\Users\\Matt\\Documents\\R\\Avengers"
setwd(dir_images)
windowsFonts(Franklin=windowsFont("Franklin Gothic Demi"))
character_names <- c("black_panther","black_widow","bucky","captain_america",
"falcon","hawkeye","hulk","iron_man",
"loki","nick_fury","rhodey","scarlet_witch",
"spiderman","thor","ultron","vision")
image_filenames <- paste0(character_names, ".jpg")
read_image <- function(filename){
char_name <- gsub(pattern = "\\.jpg$", "", filename)
img <- jpeg::readJPEG(filename)
return(img)
}
all_images <- lapply(image_filenames, read_image)
names(all_images) <- character_names
Here’s an example of how easy it is, using those names
# clear the plot window
grid.newpage()
# draw to the plot window
grid.draw(rasterGrob(all_images[['vision']]))
This was collected by Elle O’Brien, using some fancy text mining analysis on the movie scripts.
I know that you won’t be able to download it on your own computer using this line (because you don’t have the file), but maybe Elle might share it. If she wants to share it here, I’ll update this page.
load("Avengers_word_data.RData")
capitalize <- Vectorize(function(string){
substr(string,1,1) <- toupper(substr(string,1,1))
return(string)
})
proper_noun_list <- c("clint","hydra","steve","tony",
"sam","stark","strucker","nat","natasha",
"hulk","tesseract", "vision",
"loki","avengers","rogers", "cap", "hill")
# Run the capitalization function
word_data <- word_data %>%
mutate(word = ifelse(word %in% proper_noun_list, capitalize(word), word)) %>%
mutate(word = ifelse(word == "jarvis", "JARVIS", word))
Notice that the simplified character names from before don’t match the nicely formatted character names in the text dataframe
unique(word_data$Speaker)
## [1] "Black Panther" "Black Widow" "Bucky"
## [4] "Captain America" "Falcon" "Hawkeye"
## [7] "Hulk" "Iron Man" "Loki"
## [10] "Nick Fury" "Rhodey" "Scarlet Witch"
## [13] "Spiderman" "Thor" "Ultron"
## [16] "Vision"
character_labeler <- c(`black_panther` = "Black Panther",
`black_widow` = "Black Widow",
`bucky` = "Bucky",
`captain_america` = "Captain America",
`falcon` = "Falcon", `hawkeye` = "Hawkeye",
`hulk` = "Hulk", `iron_man` = "Iron Man",
`loki` = "Loki", `nick_fury` = "Nick Fury",
`rhodey` = "Rhodey",`scarlet_witch` ="Scarlet Witch",
`spiderman`="Spiderman", `thor`="Thor",
`ultron` ="Ultron", `vision` ="Vision")
one for display (pretty) and one for simple organization and referring to image file names (simple)
convert_pretty_to_simple <- Vectorize(function(pretty_name){
# pretty_name = "Vision"
simple_name <- names(character_labeler)[character_labeler==pretty_name]
# simple_name <- as.vector(simple_name)
return(simple_name)
})
# convert_pretty_to_simple(c("Vision","Thor"))
# just for fun, the inverse of that function
convert_simple_to_pretty <- function(simple_name){
# simple_name = "vision"
pretty_name <- character_labeler[simple_name] %>% as.vector()
return(pretty_name)
}
# example
convert_simple_to_pretty(c("vision","black_panther"))
## [1] "Vision" "Black Panther"
word_data$character <- convert_pretty_to_simple(word_data$Speaker)
character_palette <- c(`black_panther` = "#51473E",
`black_widow` = "#89B9CD",
`bucky` = "#6F7279",
`captain_america` = "#475D6A",
`falcon` = "#863C43", `hawkeye` = "#84707F",
`hulk` = "#5F5F3F", `iron_man` = "#9C2728",
`loki` = "#3D5C25", `nick_fury` = "#838E86",
`rhodey` = "#38454E",`scarlet_witch` ="#620E1B",
`spiderman`="#A23A37", `thor`="#323D41",
`ultron` ="#64727D", `vision` ="#81414F" )
avengers_bar_plot <- word_data %>%
group_by(Speaker) %>%
top_n(5, amount) %>%
ungroup() %>%
mutate(word = reorder(word, amount)) %>%
ggplot(aes(x = word, y = amount, fill = character))+
geom_bar(stat = "identity", show.legend = FALSE)+
scale_fill_manual(values = character_palette)+
scale_y_continuous(name ="Log Odds of Word",
breaks = c(0,1,2)) +
theme(text = element_text(family = "Franklin"),
# axis.title.x = element_text(size = rel(1.5)),
panel.grid = element_line(colour = NULL),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white",
colour = "white"))+
# theme(strip.text.x = element_text(size = rel(1.5)))+
xlab("")+
coord_flip()+
facet_wrap(~Speaker, scales = "free_y")
avengers_bar_plot
But I want to plot something more ambitious. We want the character images to show through the bars.
The idea is to display the image only in the area of the bar, cutting it off at the bar endpoint.
To do this, we will display a transparent bar, and then at the bar endpoint, plot a white bar extending to the plot edge, to cover up the rest of the picture