ByteDance SE Lab
ByteDance SE Lab
Apr 7, 2026 · Artificial Intelligence

How Scale‑SWE Enables 100k Real‑World Coding Tasks for AI Agents

The Scale‑SWE project combines a massive 100k‑sample software engineering dataset with a high‑concurrency sandbox infrastructure and a multi‑agent workflow to dramatically improve code‑agent training, evaluation, and real‑world performance, surpassing existing models on SWE‑bench benchmarks.

AI scalingSWE datasetcode agents
0 likes · 11 min read
How Scale‑SWE Enables 100k Real‑World Coding Tasks for AI Agents
JavaEdge
JavaEdge
Apr 2, 2026 · Artificial Intelligence

Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide

This article provides an in‑depth overview of the Qwen3.6‑Plus model, detailing its million‑token context window, enhanced multimodal reasoning, benchmark results across language and vision tasks, and step‑by‑step instructions for using the official API and integrating the model with popular coding assistants.

API integrationQwen3.6 PlusVisual Reasoning
0 likes · 12 min read
Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide
PaperAgent
PaperAgent
Mar 6, 2026 · Artificial Intelligence

BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges

BeyondSWE expands code‑agent evaluation beyond single‑repo bug fixing by introducing four realistic scenarios, scaling to 246 repositories and 500 samples, revealing a sharp performance drop for top models and highlighting the nuanced impact of search‑augmented agents like SearchSWE.

AI evaluationBeyondSWESearchSWE
0 likes · 6 min read
BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 13, 2026 · Artificial Intelligence

CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents

The talk introduces CVE-Factory, a framework that automatically converts sparse CVE metadata into high‑quality, executable security tasks for code agents, achieving 95% solution correctness, 96% environment fidelity, and a 66.2% verification rate on real vulnerabilities, while also releasing the LiveCVEBench benchmark and over 1,000 training environments that boost LLM performance dramatically.

AI safetyCVE-FactoryLiveCVEBench
0 likes · 4 min read
CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents