LLVM APR Benchmark Leaderboard

Total issues: 324

{

"headers": [
- "Method",
- "Base Model",
- "Score",
- "Repair Rate (Repaired/Attempts %)",
- "Repaired",
- "Repaired (Fast)",
- "Hint",
- "Number of Attempts",
- "Repaired (Crash)",
- "Repaired (Miscompilation)",
- "Repaired (Hang)",
- "Build Success Rate (%)",
- "MTTR (min)",
- "Average Sample Count"
],
"data": [
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwen3/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen3-235B-A22B</a>",
  - 25.9,
  - 34.1,
  - 84,
  - 153,
  - "w/ hint",
  - 246,
  - 60,
  - 22,
  - 2,
  - 64.9,
  - 3.9,
  - 2.9
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwq-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">QwQ-Plus-2025-03-05</a>",
  - 19.1,
  - 24.9,
  - 62,
  - 110,
  - "w/ hint",
  - 249,
  - 47,
  - 13,
  - 2,
  - 67.8,
  - 4.8,
  - 2.7
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news250325" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3-0324</a>",
  - 14.5,
  - 23.9,
  - 47,
  - 74,
  - "w/ hint",
  - 197,
  - 33,
  - 14,
  - 0,
  - 85.6,
  - 0.9,
  - 2.5
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news250120" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-R1</a>",
  - 13.3,
  - 22.2,
  - 43,
  - 67,
  - "w/ hint",
  - 194,
  - 33,
  - 9,
  - 1,
  - 54.3,
  - 3.8,
  - 2.8
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news1226" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3</a>",
  - 12.7,
  - 21,
  - 41,
  - 65,
  - "w/ hint",
  - 195,
  - 29,
  - 11,
  - 1,
  - 73.2,
  - 1.5,
  - 2.4
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwen2.5-max" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-Max-2025-01-25</a>",
  - 12.3,
  - 21.4,
  - 40,
  - 64,
  - "w/ hint",
  - 187,
  - 33,
  - 7,
  - 0,
  - 84.8,
  - 0.7,
  - 2.4
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://moonshotai.github.io/Kimi-K2/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Kimi-K2-0711</a>",
  - 10.2,
  - 27,
  - 33,
  - 55,
  - "w/ hint",
  - 122,
  - 22,
  - 11,
  - 0,
  - 86.9,
  - 1.2,
  - 2.4
  ]
],
"metadata": null

}


Miscompilation	324	116	35.8	195	60.2


All	324	116	35.8	195	60.2
Crash	199	76	38.2	121	60.8
Miscompilation	116	36	31	69	59.5
Hang	9	4	44.4	5	55.6


InductiveRangeCheckElimination	76	20	26.3	49	64.5


SLPVectorizer	76	20	26.3	49	64.5
LoopVectorize	75	20	26.7	32	42.7
InstCombine	58	18	31	41	70.7
ScalarEvolution	15	6	40	9	60
VectorCombine	13	10	76.9	13	100
ValueTracking	10	1	10	3	30
IR	7	2	28.6	2	28.6
ConstraintElimination	6	2	33.3	3	50
InstructionSimplify	5	2	40	2	40
SimplifyIndVar	4	0	0	1	25
LoopPeel	4	3	75	3	75
Local	4	1	25	1	25
MemorySSAUpdater	3	0	0	0	0
LoopAccessAnalysis	3	2	66.7	2	66.7
MemCpyOptimizer	3	2	66.7	2	66.7
FunctionAttrs	3	3	100	3	100
ConstantFold	3	2	66.7	2	66.7
DeadStoreElimination	3	2	66.7	3	100
SimplifyCFG	2	1	50	1	50
LoopStrengthReduce	2	1	50	2	100
LazyValueInfo	2	1	50	1	50
EarlyCSE	2	1	50	1	50
LICM	2	0	0	1	50
Evaluator	1	0	0	0	0
Scalarizer	1	0	0	0	0
SimplifyLibCalls	1	1	100	1	100
DeadArgumentElimination	1	0	0	0	0
CorrelatedValuePropagation	1	0	0	1	100
Coroutines	1	0	0	0	0
GVNSink	1	1	100	1	100
DemoteRegToStack	1	1	100	1	100
LoopDeletion	1	1	100	1	100
IndVarSimplify	1	1	100	1	100
Reassociate	1	0	0	0	0
LoopUnrollAndJamPass	1	0	0	0	0
ValueMapper	1	0	0	0	0
LowerSwitch	1	1	100	1	100
LoopSimplifyCFG	1	1	100	1	100
VectorUtils	1	0	0	0	0
LoopUnrollRuntime	1	1	100	1	100
LoopCacheAnalysis	1	0	0	0	0
Attributor	1	1	100	1	100
GVN	1	0	0	0	0
NewGVN	1	0	0	0	0
JumpThreading	1	0	0	0	0
AliasAnalysis	1	0	0	0	0
AggressiveInstCombine	1	1	100	1	100
SCCPSolver	1	0	0	0	0
BDCE	1	0	0	1	100
GlobalOpt	1	1	100	1	100
DFAJumpThreading	1	0	0	0	0
MoveAutoInit	1	1	100	1	100
InlineCost	1	1	100	1	100
SimpleLoopUnswitch	1	1	100	1	100
InductiveRangeCheckElimination	1	1	100	1	100
Instrumentation	1	1	100	1	100


Baseline(Qwen2.5-Max-2025-01-25)	14


Baseline(Qwen3-235B-A22B)	14
Baseline(QwQ-Plus-2025-03-05)	7
Baseline(DeepSeek-R1)	3
Baseline(Kimi-K2-0711)	3
Baseline(DeepSeek-V3-0324)	2
Baseline(Qwen2.5-Max-2025-01-25)	2
Baseline(DeepSeek-V3)	0

With the provided evaluation environment, you can get a certificate by calling env.dump().

Please submit your evaluation results (generated by scripts/submit.py) to dtcxzyw/llvm-apr-benchmark-submissions.