UC Berkeley · MA Statistics · May 2026

Statistics
& Data Science

I build rigorous, reproducible analyses across statistical modeling, machine learning, and data communication. My projects span 43,000-patient clinical datasets, satellite imagery, social network graphs, and multimodal NLP pipelines, always with an emphasis on results that are interpretable and well-communicated.

Featured in the UC Berkeley Statistics Student Spotlight

🏆 2025-26 UC Berkeley MA Statistics Community Leadership Award

Madison MacDonald

Background

Rigorous by training.
Curious by nature.

I am a statistician and data scientist finishing my MA in Statistics at UC Berkeley, where I also teach as a Graduate Student Instructor and represent the program as an Outreach Peer Ambassador.

Before Berkeley, I graduated top of my class in Applied Statistics from Purdue University with a 4.0 major GPA. My work spans NLP, network analysis, multimodal learning, Bayesian inference, and brain encoding models always with an emphasis on interpretability and communication.

I have seven years of private tutoring experience in math and statistics, which has sharpened how I explain complex ideas to any audience.

3.9
GPA at UC Berkeley
4.0
Major GPA at Purdue
7+
Years tutoring math & stats
9
Public project repos

Technical Skills

Languages

  • Python
  • R
  • SQL
  • SAS
  • LaTeX
  • MATLAB
  • Shell (UNIX)

ML & Modeling

  • scikit-learn
  • TensorFlow
  • PyTorch
  • XGBoost
  • Bayesian Inference
  • Time Series
  • fMRI Encoding
  • Ridge & Lasso
  • LoRA Fine-tuning

Data & NLP

  • pandas · NumPy
  • BERTopic · LDA
  • NLTK · librosa
  • NetworkX · pyvis
  • Plotly · Tableau
  • OpenCV · Pillow
  • Word2Vec · GloVe · BERT
  • SHAP · LIME
  • glmnet (R)

Tools

  • Git & GitHub
  • Jupyter
  • Conda
  • Dask
  • HPC (Bridges2)
  • GitHub Pages

Selected Projects

Work that shows the range.

01
Multimodal Meme & Speech Analysis
Analyzed how political and COVID memes assign blame and heroism across visual, textual, and acoustic features 5,552 memes, 1,081 campaign speeches, and debate audio clips.
NLPComputer Visionlibrosascikit-learn
02
LLM vs. Human Text Classification
Classified 788,922 texts as human or LLM-generated using BERTopic and TF-IDF. Macro F1 = 0.862. Found that AI writing operates at a level of abstraction humans don't.
BERTopicTF-IDFClassificationGitHub Pages
03
Marvel Universe Network Analysis
Mapped character co-appearance networks across 6,000+ Marvel comics using Louvain community detection and interactive visualization deployed to GitHub Pages.
NetworkXpyvisPlotlyCommunity Detection
04
Genetic Algorithm Variable Selection
Python package implementing a genetic algorithm for feature selection in regression models complete with unit tests, integration tests, and a real-data baseball demo.
Python PackageOptimizationTesting
05
Arctic Cloud Detection
Compared five classifiers Logistic Regression, Random Forest, XGBoost, LDA, QDA on MISR satellite imagery for cloud vs. non-cloud classification with full LaTeX report.
scikit-learnXGBoostSatellite DataLaTeX
06
Pediatric TBI Risk Analysis
Cleaned a 43,000-patient ED dataset and built three classifiers to predict traumatic brain injury. Found bicycle crashes carry higher ciTBI rates than motor vehicle crashes.
Clinical DataClassificationRLaTeX
07
Structural Predictors of Homelessness
Modeled log homelessness rates across 38 California CoCs using a pre-specified OLS ladder with HC1 robust SEs, Lasso and Ridge, stepwise AIC, WLS, and leverage diagnostics. Found eviction rate is the most robust predictor across every specification.
RPanel DataOLSglmnetHUD Data
08
fMRI Encoding Models from Text
Predicted voxelwise BOLD responses as subjects listened to podcast stories. Compared BoW, Word2Vec, GloVe, pretrained BERT, fine-tuned BERT, and LoRA-adapted BERT. Ridge regression on Bridges2 HPC with SHAP and LIME interpretation of well-predicted voxels.
NLPBERTLoRARidge RegressionHPC
09
Reddit vs. Moltbook: AI Vernacular Culture
Compared how AI agents and humans develop vocabulary, genre, and identity language across matched community boards. BERTopic topic modeling, sentiment analysis, platform classification, and close reading across 27GB of Reddit and Moltbook corpora.
BERTopicNLPCultural AnalyticsPython

Experience

Where I have worked.

Jan 2026 to May 2026
Graduate Student Instructor
UC Berkeley
Teach weekly discussion sections, hold office hours, and develop review materials for graduate-level statistics coursework.
✉ Student Note ✉ Course Eval Student Comments
Aug 2025 to May 2026
Outreach Peer Ambassador
UC Berkeley Statistics
Represent the MA Statistics program at national conferences and information sessions; counsel prospective applicants through the graduate admissions process.
Sep 2023 to Dec 2024
Undergraduate Data Science Researcher
The Data Mine, Purdue University
Led a 5-person analytics team in partnership with the American Mathematical Society. Cleaned and analyzed large survey datasets, built an interactive public-facing dashboard replacing static PDFs, and presented weekly to AMS stakeholders.
May 2024 to July 2024
Research Assistant
Purdue Puentes Project
Conducted qualitative and quantitative analysis to identify key stressors and resilience strategies in rural Latine communities. Cleaned and processed large survey datasets in Excel and programmed SPSS code for biometric calculations to support research findings and team decisions.
Sep 2017 to May 2025
Private Mathematics Tutor
Purdue University & West Lafayette Schools
Seven years tutoring 10–15 students at a time across undergraduate and graduate statistics coursework. Developed practice materials and led group study sessions.

Contact

Let's connect.

I am actively looking for data science, research, and analytics roles in the Bay Area. Feel free to reach out.

LinkedIn GitHub