Volcano Engine Developer Services
Dec 10, 2024 · Artificial Intelligence
Introducing FullStack Bench: Multi‑Language Code LLM Benchmark & SandboxFusion
The article presents FullStack Bench, a newly open‑sourced, multi‑language code‑LLM evaluation dataset covering over 11 real‑world programming scenarios and 16 languages, along with the SandboxFusion execution environment, and reports comprehensive benchmark results that highlight the superiority of closed‑source models over most open‑source alternatives.
AI evaluationFullStack BenchSandboxFusion
0 likes · 11 min read
