Project Materials
Summary
The short answer is yes, and the difference goes deeper than just writing style. Using TF-IDF based classification, a logistic regression model separated human from AI chunks with a macro F1 of 0.862. The most predictive features were not random: AI writing consistently reflected more abstract language, while human writing was grounded in specific dates, citations, and institutional vocabulary.
Topic modeling with BERTopic and LDA across two chunking strategies confirmed the same pattern, i.e., human chunks concentrated in narrative and academically grounded topics, while AI chunks concentrated in generic essay-prompt territory. All four model/strategy combinations returned chi-square p-values essentially at zero, meaning the Human vs. AI distinction showed up no matter how the data was sliced.
Key Findings
TF-IDF features alone carried strong signal. Logistic Regression achieved macro F1 = 0.862 and Naive Bayes F1 = 0.816, both well above the 0.5 random chance baseline.
AI writing reaches for abstraction. Words like potential, significant, additionally, ultimately were the top predictors of AI authorship across every model tested.
Human writing is grounded in specifics. Dates like 2007, 2008, 2009 and words like cited, references, essay were the strongest predictors of human authorship.
The distinction held across all four models. All chi-square tests returned p-values essentially at zero, meaning topic assignment was never independent of the Human vs. AI label.
Dataset
788,922 texts from humans and 62 different LLMs. A reproducible 10,000-row sample was used for all analysis (RANDOM_SEED = 230). Kaggle, Zachary Grinberg, 2024.