LLVM APR Benchmark Leaderboard

Leaderboard for the LLVM APR Benchmark.

Total issues: 324

{
  • "headers": [
    • "Method",
    • "Base Model",
    • "Score",
    • "Repair Rate (Repaired/Attempts %)",
    • "Repaired",
    • "Repaired (Fast)",
    • "Hint",
    • "Number of Attempts",
    • "Repaired (Crash)",
    • "Repaired (Miscompilation)",
    • "Repaired (Hang)",
    • "Build Success Rate (%)",
    • "MTTR (min)",
    • "Average Sample Count"
    ],
  • "data": [
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://qwenlm.github.io/blog/qwen3/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen3-235B-A22B</a>",
      • 25.9,
      • 34.1,
      • 84,
      • 153,
      • "w/ hint",
      • 246,
      • 60,
      • 22,
      • 2,
      • 64.9,
      • 3.9,
      • 2.9
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://qwenlm.github.io/blog/qwq-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">QwQ-Plus-2025-03-05</a>",
      • 19.1,
      • 24.9,
      • 62,
      • 110,
      • "w/ hint",
      • 249,
      • 47,
      • 13,
      • 2,
      • 67.8,
      • 4.8,
      • 2.7
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://api-docs.deepseek.com/news/news250325" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3-0324</a>",
      • 14.5,
      • 23.9,
      • 47,
      • 74,
      • "w/ hint",
      • 197,
      • 33,
      • 14,
      • 0,
      • 85.6,
      • 0.9,
      • 2.5
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://api-docs.deepseek.com/news/news250120" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-R1</a>",
      • 13.3,
      • 22.2,
      • 43,
      • 67,
      • "w/ hint",
      • 194,
      • 33,
      • 9,
      • 1,
      • 54.3,
      • 3.8,
      • 2.8
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://api-docs.deepseek.com/news/news1226" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3</a>",
      • 12.7,
      • 21,
      • 41,
      • 65,
      • "w/ hint",
      • 195,
      • 29,
      • 11,
      • 1,
      • 73.2,
      • 1.5,
      • 2.4
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://qwenlm.github.io/blog/qwen2.5-max" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-Max-2025-01-25</a>",
      • 12.3,
      • 21.4,
      • 40,
      • 64,
      • "w/ hint",
      • 187,
      • 33,
      • 7,
      • 0,
      • 84.8,
      • 0.7,
      • 2.4
      ],
    • [
      • "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
      • "<a target="_blank" href="https://moonshotai.github.io/Kimi-K2/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Kimi-K2-0711</a>",
      • 10.2,
      • 27,
      • 33,
      • 55,
      • "w/ hint",
      • 122,
      • 22,
      • 11,
      • 0,
      • 86.9,
      • 1.2,
      • 2.4
      ]
    ],
  • "metadata": null
}
Textbox
Inspct Issue
Method(Model)

With the provided evaluation environment, you can get a certificate by calling env.dump().

Please submit your evaluation results (generated by scripts/submit.py) to dtcxzyw/llvm-apr-benchmark-submissions.