Swiftie Bench

How often models pick Taylor Swift songs over equally popular alternatives

Measures how often models pick Taylor Swift songs over equally well-known alternatives. Each prompt asks the model to randomly pick a song from a list of 4: one by Taylor Swift, three by other artists. Random baseline is 25%.

An unbiased model would pick each song ~25% of the time. Deviation from this baseline reveals how training data popularity biases affect model behavior on simple selection tasks.

Model	Preference Rate
INTELLECT 3	31%
Gemma 3 27B	27%
GPT-5.4	21%
DeepSeek V3.2	20%
GLM 4.7	20%
GPT OSS 120B	17%
Llama 3.1 8B	14%
GPT OSS 20B	12%
Llama 3.3 70B	11%
Kimi K2.5	10%
Claude Sonnet 4.6	9%

Last updated 12 March 2026 at 02:28