Tagged articles

code agents

6 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 25, 2026 · Artificial Intelligence

Introducing DeNovoSWE: The First Long‑Horizon Doc2Repo Training Set for Code Agents

DeNovoSWE, a newly released large‑scale dataset of 4,818 high‑quality document‑to‑repository tasks, uses a Divide‑and‑Conquer and Critic‑Repair pipeline to generate well‑organized, evaluation‑aligned specifications, and experiments show it boosts LLM code agents’ repository‑level generation performance from single‑digit to over 40% on benchmarks.

DatasetLLMbenchmark

0 likes · 10 min read

Introducing DeNovoSWE: The First Long‑Horizon Doc2Repo Training Set for Code Agents

Fun with Large Models

Jun 11, 2026 · Artificial Intelligence

Master Claude Code with 6 GitHub Projects: From Multi‑Agent Collaboration to Source‑Code Deep Dive

This guide walks developers through six curated GitHub repositories that enable advanced multi‑agent usage of Claude Code, teach the fundamentals of building a custom code‑agent from scratch, and provide deep source‑code analysis for a complete understanding of AI‑powered programming assistants.

AI programmingClaude CodeDeepAgents

0 likes · 13 min read

Master Claude Code with 6 GitHub Projects: From Multi‑Agent Collaboration to Source‑Code Deep Dive

ByteDance SE Lab

Apr 7, 2026 · Artificial Intelligence

How Scale‑SWE Enables 100k Real‑World Coding Tasks for AI Agents

The Scale‑SWE project combines a massive 100k‑sample software engineering dataset with a high‑concurrency sandbox infrastructure and a multi‑agent workflow to dramatically improve code‑agent training, evaluation, and real‑world performance, surpassing existing models on SWE‑bench benchmarks.

AI scalingMulti-agent workflowSWE dataset

0 likes · 11 min read

How Scale‑SWE Enables 100k Real‑World Coding Tasks for AI Agents

JavaEdge

Apr 2, 2026 · Artificial Intelligence

Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide

This article provides an in‑depth overview of the Qwen3.6‑Plus model, detailing its million‑token context window, enhanced multimodal reasoning, benchmark results across language and vision tasks, and step‑by‑step instructions for using the official API and integrating the model with popular coding assistants.

API IntegrationQwen3.6-PlusVisual Reasoning

0 likes · 12 min read

Unlocking Qwen3.6-Plus: Features, Multimodal Performance, and API Guide

PaperAgent

Mar 6, 2026 · Artificial Intelligence

BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges

BeyondSWE expands code‑agent evaluation beyond single‑repo bug fixing by introducing four realistic scenarios, scaling to 246 repositories and 500 samples, revealing a sharp performance drop for top models and highlighting the nuanced impact of search‑augmented agents like SearchSWE.

AI evaluationBeyondSWESearchSWE

0 likes · 6 min read

BeyondSWE: Rethinking Code Agent Benchmarks with Real‑World Multi‑Repo Challenges

Machine Learning Algorithms & Natural Language Processing

Feb 13, 2026 · Artificial Intelligence

CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents

The talk introduces CVE-Factory, a framework that automatically converts sparse CVE metadata into high‑quality, executable security tasks for code agents, achieving 95% solution correctness, 96% environment fidelity, and a 66.2% verification rate on real vulnerabilities, while also releasing the LiveCVEBench benchmark and over 1,000 training environments that boost LLM performance dramatically.

AI safetyCVE-FactoryLiveCVEBench

0 likes · 4 min read

CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents