Unnecessary Extraction

How often models extract trivial single-use functions instead of keeping code inline

Measures how often models extract single-use trivial functions instead of keeping code inline. A function is flagged if it has ≤3 lines of body (excluding docstrings) and is called exactly once.

  • Ratio — fraction of all function definitions that are unnecessary extractions
  • Total Functions — average number of function definitions per file

50 Python prompts, each requesting a multi-stage utility script with enough distinct steps to tempt extraction into tiny single-use helpers.

Model Ratio Total Functions
GPT-5.424.9%3.30
GPT OSS 20B16.0%4.37
GPT OSS 120B14.7%5.67
Llama 3.3 70B14.2%1.13
Claude Sonnet 4.613.4%10.63
Kimi K2.57.7%3.65
DeepSeek V3.25.4%6.27
GLM 4.75.4%1.77
GLM-53.2%3.67
MiniMax-M2.110.6%2.44

Last updated 12 March 2026 at 02:28