Tagged articles

ToolCall

2 articles · Page 1 of 1

May 16, 2026 · Artificial Intelligence

Can Your PC Run Large Language Models? Meet BenchLoop, the Local Benchmarking Tool

BenchLoop is a CLI‑plus‑Web application that lets you reproducibly benchmark locally‑run LLMs across seven suites—including speed, tool‑calling, coding and agent tasks—while recording hardware details, scoring results with a weighted formula, and optionally publishing them to a public leaderboard.

AI evaluationBenchLoopLLM benchmarking

0 likes · 14 min read

Can Your PC Run Large Language Models? Meet BenchLoop, the Local Benchmarking Tool

Xiaohongshu Tech REDtech

May 12, 2026 · Artificial Intelligence

Treating Automated Testing as AI Coding: Xiaohongshu GUI Agent Real‑World Review

During the 2026 Spring Festival promotion, Xiaohongshu replaced manual UI testing with a three‑layer AI‑driven GUI Agent that executed over 43,000 runs across 106 devices and 128 scenarios, achieving 58% automation, 82% AI‑generated case adoption, 68% bug recall, 98% stability and roughly $1 per test case while drastically cutting token costs.

AI codingCode-as-ActionGUI Agent

0 likes · 23 min read

Treating Automated Testing as AI Coding: Xiaohongshu GUI Agent Real‑World Review