Unnecessary Extraction
How often models extract trivial single-use functions instead of keeping code inline
Measures how often models extract single-use trivial functions instead of keeping code inline. A function is flagged if it has ≤3 lines of body (excluding docstrings) and is called exactly once.
- Ratio — fraction of all function definitions that are unnecessary extractions
- Total Functions — average number of function definitions per file
50 Python prompts, each requesting a multi-stage utility script with enough distinct steps to tempt extraction into tiny single-use helpers.
| Model | Ratio | Total Functions |
|---|---|---|
| GPT-5.4 | 24.9% | 3.30 |
| GPT OSS 20B | 16.0% | 4.37 |
| GPT OSS 120B | 14.7% | 5.67 |
| Llama 3.3 70B | 14.2% | 1.13 |
| Claude Sonnet 4.6 | 13.4% | 10.63 |
| Kimi K2.5 | 7.7% | 3.65 |
| DeepSeek V3.2 | 5.4% | 6.27 |
| GLM 4.7 | 5.4% | 1.77 |
| GLM-5 | 3.2% | 3.67 |
| MiniMax-M2.1 | 10.6% | 2.44 |
Last updated 12 March 2026 at 02:28