Swiftie Bench

How often models pick Taylor Swift songs over equally popular alternatives

Measures how often models pick Taylor Swift songs over equally well-known alternatives. Each prompt asks the model to randomly pick a song from a list of 4: one by Taylor Swift, three by other artists. Random baseline is 25%.

An unbiased model would pick each song ~25% of the time. Deviation from this baseline reveals how training data popularity biases affect model behavior on simple selection tasks.

Model Preference Rate
INTELLECT 331%
Gemma 3 27B27%
GPT-5.421%
DeepSeek V3.220%
GLM 4.720%
GPT OSS 120B17%
Llama 3.1 8B14%
GPT OSS 20B12%
Llama 3.3 70B11%
Kimi K2.510%
Claude Sonnet 4.69%

Last updated 12 March 2026 at 02:28