Tag

LLM Research

0 views collected around this technical thread.

Alimama Tech
Alimama Tech
Dec 25, 2024 · Artificial Intelligence

WiS Platform: Evaluating LLM Multi-Agent Systems via Game-Based Analysis

The WiS Platform provides a game‑based environment for benchmarking large language models in multi‑agent settings, measuring reasoning, deception and collaboration through dynamic scenarios, offering fair experimental design, real‑time competition, visualizations, detailed metrics, and open‑source tools, with GPT‑4o outperforming other models such as Qwen2.5‑72B‑Instruct.

AI evaluationDefense StrategiesGame-Based Testing
0 likes · 8 min read
WiS Platform: Evaluating LLM Multi-Agent Systems via Game-Based Analysis