LLVM APR Benchmark Leaderboard

Total issues: 330

{

"headers": [
- "Method",
- "Base Model",
- "Score",
- "Repair Rate (Repaired/Attempts %)",
- "Repaired",
- "Repaired (Fast)",
- "Hint",
- "Number of Attempts",
- "Repaired (Crash)",
- "Repaired (Miscompilation)",
- "Repaired (Hang)",
- "Build Success Rate (%)",
- "MTTR (min)",
- "Average Sample Count"
],
"data": [
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwen3/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen3-235B-A22B</a>",
  - 25.5,
  - 34.1,
  - 84,
  - 153,
  - "w/ hint",
  - 246,
  - 60,
  - 22,
  - 2,
  - 64.9,
  - 3.9,
  - 2.9
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://moonshotai.github.io/Kimi-K2/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Kimi-K2-0711</a>",
  - 21.8,
  - 28.9,
  - 72,
  - 111,
  - "w/ hint",
  - 249,
  - 51,
  - 20,
  - 1,
  - 87.8,
  - 1.4,
  - 2.5
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwq-32b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">QwQ-Plus-2025-03-05</a>",
  - 18.8,
  - 24.9,
  - 62,
  - 110,
  - "w/ hint",
  - 249,
  - 47,
  - 13,
  - 2,
  - 67.8,
  - 4.8,
  - 2.7
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news250325" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3-0324</a>",
  - 14.2,
  - 23.9,
  - 47,
  - 74,
  - "w/ hint",
  - 197,
  - 33,
  - 14,
  - 0,
  - 85.6,
  - 0.9,
  - 2.5
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news250120" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-R1</a>",
  - 13,
  - 22.2,
  - 43,
  - 67,
  - "w/ hint",
  - 194,
  - 33,
  - 9,
  - 1,
  - 54.3,
  - 3.8,
  - 2.8
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://api-docs.deepseek.com/news/news1226" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">DeepSeek-V3</a>",
  - 12.4,
  - 21,
  - 41,
  - 65,
  - "w/ hint",
  - 195,
  - 29,
  - 11,
  - 1,
  - 73.2,
  - 1.5,
  - 2.4
  ],
- [
  - "<a target="_blank" href="https://github.com/dtcxzyw/llvm-apr-benchmark/blob/main/examples/baseline.py" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Baseline</a>",
  - "<a target="_blank" href="https://qwenlm.github.io/blog/qwen2.5-max" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen2.5-Max-2025-01-25</a>",
  - 12.1,
  - 21.4,
  - 40,
  - 64,
  - "w/ hint",
  - 187,
  - 33,
  - 7,
  - 0,
  - 84.8,
  - 0.7,
  - 2.4
  ]
],
"metadata": null

}


Miscompilation	330	123	37.3	200	60.6


All	330	123	37.3	200	60.6
Crash	205	81	39.5	125	61
Miscompilation	116	38	32.8	70	60.3
Hang	9	4	44.4	5	55.6


InductiveRangeCheckElimination	77	22	28.6	50	64.9


SLPVectorizer	77	22	28.6	50	64.9
LoopVectorize	76	23	30.3	34	44.7
InstCombine	59	20	33.9	42	71.2
ScalarEvolution	15	6	40	9	60
VectorCombine	13	10	76.9	13	100
ValueTracking	10	1	10	3	30
IR	7	2	28.6	2	28.6
ConstraintElimination	6	2	33.3	3	50
InstructionSimplify	5	2	40	2	40
LoopPeel	4	3	75	3	75
Local	4	1	25	1	25
SimplifyIndVar	4	0	0	1	25
LoopAccessAnalysis	3	2	66.7	2	66.7
MemorySSAUpdater	3	0	0	0	0
DeadStoreElimination	3	2	66.7	3	100
MemCpyOptimizer	3	2	66.7	2	66.7
FunctionAttrs	3	3	100	3	100
LoopSimplifyCFG	3	1	33.3	1	33.3
ConstantFold	3	2	66.7	2	66.7
LoopStrengthReduce	2	1	50	2	100
EarlyCSE	2	1	50	1	50
SimplifyCFG	2	1	50	2	100
LICM	2	0	0	1	50
LazyValueInfo	2	1	50	1	50
DemoteRegToStack	1	1	100	1	100
LoopUnrollRuntime	1	1	100	1	100
Scalarizer	1	0	0	0	0
DeadArgumentElimination	1	0	0	0	0
CorrelatedValuePropagation	1	0	0	1	100
GVNSink	1	1	100	1	100
Coroutines	1	0	0	0	0
SimplifyLibCalls	1	1	100	1	100
Evaluator	1	0	0	0	0
ValueMapper	1	0	0	0	0
IndVarSimplify	1	1	100	1	100
Reassociate	1	0	0	0	0
LoopUnrollAndJamPass	1	0	0	0	0
LoopDeletion	1	1	100	1	100
LowerSwitch	1	1	100	1	100
VectorUtils	1	0	0	0	0
LoopCacheAnalysis	1	0	0	0	0
Attributor	1	1	100	1	100
GVN	1	0	0	0	0
NewGVN	1	0	0	0	0
JumpThreading	1	0	0	0	0
AliasAnalysis	1	0	0	0	0
LoopFuse	1	0	0	0	0
AggressiveInstCombine	1	1	100	1	100
SCCPSolver	1	0	0	0	0
BDCE	1	0	0	1	100
GlobalOpt	1	1	100	1	100
DFAJumpThreading	1	0	0	0	0
MoveAutoInit	1	1	100	1	100
InlineCost	1	1	100	1	100
SimpleLoopUnswitch	1	1	100	1	100
InductiveRangeCheckElimination	1	1	100	1	100
Instrumentation	1	1	100	1	100


Baseline(Qwen2.5-Max-2025-01-25)	10


Baseline(Qwen3-235B-A22B)	10
Baseline(Kimi-K2-0711)	10
Baseline(QwQ-Plus-2025-03-05)	7
Baseline(DeepSeek-R1)	3
Baseline(DeepSeek-V3-0324)	2
Baseline(Qwen2.5-Max-2025-01-25)	2
Baseline(DeepSeek-V3)	0

With the provided evaluation environment, you can get a certificate by calling env.dump().

Please submit your evaluation results (generated by scripts/submit.py) to dtcxzyw/llvm-apr-benchmark-submissions.