LLVM APR Benchmark Leaderboard

Leaderboard for the LLVM APR Benchmark.

Total issues: 262

{
  • "headers": [
    • "Method",
    • "Base Model",
    • "Score",
    • "Repaired",
    • "Repaired (Fast)",
    • "Hint",
    • "Number of Attempts",
    • "Repaired (Crash)",
    • "Repaired (Miscompilation)",
    • "Repaired (Hang)",
    • "Build Success Rate (%)",
    • "MTTR (min)",
    • "Average Sample Count"
    ],
  • "data": [
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://qwenlm.github.io/blog/qwq-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">QwQ-Plus-2025-03-05</a>",
      • 18.3,
      • 48,
      • 90,
      • "w/ hint",
      • 197,
      • 36,
      • 10,
      • 2,
      • 69.6,
      • 4.9,
      • 2.7
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://api-docs.deepseek.com/news/news250120" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-R1</a>",
      • 16.4,
      • 43,
      • 67,
      • "w/ hint",
      • 194,
      • 33,
      • 9,
      • 1,
      • 54.3,
      • 3.8,
      • 2.8
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://api-docs.deepseek.com/news/news1226" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3</a>",
      • 15.6,
      • 41,
      • 65,
      • "w/ hint",
      • 195,
      • 29,
      • 11,
      • 1,
      • 73.2,
      • 1.5,
      • 2.4
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://qwenlm.github.io/blog/qwen2.5-max" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-Max-2025-01-25</a>",
      • 15.3,
      • 40,
      • 64,
      • "w/ hint",
      • 187,
      • 33,
      • 7,
      • 0,
      • 84.8,
      • 0.7,
      • 2.4
      ]
    ],
  • "metadata": null
}
Textbox
Category
Total
Repaired
Repair Rate (%)
Repaired (Fast)
Repair Rate (Fast) (%)
Miscompilation
262
79
30.2
132
50.4
Component
Total
Repaired
Repair Rate (%)
Repaired (Fast)
Repair Rate (Fast) (%)
InductiveRangeCheckElimination
66
12
18.2
20
30.3