Swiftie Bench
How often models pick Taylor Swift songs over equally popular alternatives
Measures how often models pick Taylor Swift songs over equally well-known alternatives. Each prompt asks the model to randomly pick a song from a list of 4: one by Taylor Swift, three by other artists. Random baseline is 25%.
An unbiased model would pick each song ~25% of the time. Deviation from this baseline reveals how training data popularity biases affect model behavior on simple selection tasks.
| Model | Preference Rate |
|---|---|
| INTELLECT 3 | 31% |
| Gemma 3 27B | 27% |
| GPT-5.4 | 21% |
| DeepSeek V3.2 | 20% |
| GLM 4.7 | 20% |
| GPT OSS 120B | 17% |
| Llama 3.1 8B | 14% |
| GPT OSS 20B | 12% |
| Llama 3.3 70B | 11% |
| Kimi K2.5 | 10% |
| Claude Sonnet 4.6 | 9% |
Last updated 12 March 2026 at 02:28