Project Materials
Summary
Sixty years of Marvel comics, 6,439 heroes, and 96,104 co-appearances later...the network structure tells a surprisingly coherent story. Looking at who co-appears with whom, Captain America and Spider-Man came out on top as the most connected heroes, which is exactly what you'd expect. The more interesting findings came from digging deeper.
An algorithm with zero prior Marvel knowledge sorted heroes into communities that almost perfectly match the actual teams, e.g., X-Men together, Avengers together, Fantastic Four together, and Defenders together. Notably, Hawkeye turned out to be one of the biggest surprises, not the most connected hero by any means, but consistently one of the top bridge characters in the network, linking parts of the Marvel Universe that wouldn't otherwise be connected.
Moreover, Spider-Man beat out Captain America for most comic issues appeared in overall, while Captain America co-appeared with the broadest range of unique heroes, a subtle but real distinction between Marvel's most prolific solo character and its most collaborative one. The one metric that broke down in an interesting way was eigenvector centrality, which got dominated by obscure 1940s heroes rather than modern icons, so it turns out being densely connected in the early days of Marvel can inflate your "prestige score" in ways that have nothing to do with actual cultural prominence. That finding alone is a good reminder that network metrics are only as meaningful as the data behind them, and that 60 years of publishing history collapsed into a single static graph will always carry the fingerprints of the era it started in. The full discussion, limitations, and ideas for further study are in the notebook.
Key Findings
Community detection worked almost too well. An algorithm with no Marvel knowledge sorted heroes into groups that almost perfectly match the actual team rosters: X-Men, Avengers, Fantastic Four, and Defenders all emerged as distinct communities.
Hawkeye is a bigger bridge than his fame suggests. Despite not being one of the most connected heroes, he consistently ranked near the top for betweenness centrality, linking otherwise separate corners of the network.
Spider-Man vs. Captain America tells two different stories. Spider-Man appeared in more total issues; Captain America co-appeared with the broadest range of unique heroes. Most prolific solo character vs. most collaborative one.
Eigenvector centrality broke down in a revealing way. Obscure 1940s Golden Age heroes dominated the weighted version, showing how early-era density can distort prestige metrics in ways that don't reflect actual cultural prominence.
Interactive Network Visualizations
PyVis — Please allow a moment to load
The most densely connected community in the network, anchored by Wolverine, Cyclops, and Storm
Centered on Captain America and Iron Man, the two highest degree heroes in the full network
Spider-Man's extended cast including Daredevil, Mary Jane, and J. Jonah Jameson
The loosest community in the network, reflecting the Defenders' informal team structure
Cytoscape via NDEx — no account needed to view
20 nodes, 59 edges
20 nodes, 186 edges
20 nodes, 129 edges
20 nodes, 125 edges
20 nodes, 101 edges
20 nodes, 189 edges