Tag

benchmark evaluation

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jun 17, 2025 · Artificial Intelligence

MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks

MiniMax’s newly released open‑source M1 model, built on the Lightning Attention‑enhanced MiniMax‑01 base, delivers up to 1 million token context, achieves near‑state‑of‑the‑art performance on MRCR and other long‑context benchmarks, and showcases impressive multilingual translation, code completion, and creative applications.

Lightning AttentionMiniMaxbenchmark evaluation
0 likes · 11 min read
MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks
Tencent Technical Engineering
Tencent Technical Engineering
Jun 5, 2025 · Artificial Intelligence

How AI Agents Turn 0‑Day Vulnerability Hunting into an Automated Production Line

This article explores how a multi‑agent AI system dramatically improves 0‑day vulnerability detection by automating code audit, reducing false positives, and outperforming traditional static analysis tools in large‑scale real‑world benchmarks.

0day vulnerabilityAI Agentautomated security testing
0 likes · 9 min read
How AI Agents Turn 0‑Day Vulnerability Hunting into an Automated Production Line
AntTech
AntTech
Apr 17, 2024 · Artificial Intelligence

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

LLMRG introduces a novel framework that leverages large language models to construct personalized reasoning graphs, integrating chain reasoning, self‑verification, divergent extension, and knowledge‑base self‑improvement, thereby enhancing recommendation accuracy, interpretability, and performance across multiple benchmark datasets without additional user or item information.

AIInterpretabilityLarge Language Models
0 likes · 9 min read
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs