Volcano Engine Developer Services
Volcano Engine Developer Services
Dec 10, 2024 · Artificial Intelligence

Introducing FullStack Bench: Multi‑Language Code LLM Benchmark & SandboxFusion

The article presents FullStack Bench, a newly open‑sourced, multi‑language code‑LLM evaluation dataset covering over 11 real‑world programming scenarios and 16 languages, along with the SandboxFusion execution environment, and reports comprehensive benchmark results that highlight the superiority of closed‑source models over most open‑source alternatives.

AI evaluationFullStack BenchSandboxFusion
0 likes · 11 min read
Introducing FullStack Bench: Multi‑Language Code LLM Benchmark & SandboxFusion