AI metrics — 6 Technical Articles

Feb 27, 2026 · Artificial Intelligence

Can Deep Thought Ratio Reveal the True Reasoning Power of LLMs?

This article introduces the Deep Thought Ratio (DTR) metric, explains how tracking token modifications across neural network layers quantifies genuine inference effort, and shows through extensive experiments that DTR predicts accuracy far better than token length while enabling a sampling strategy that halves computational cost.

AI metricsInference EfficiencyLLM evaluation

0 likes · 9 min read

Can Deep Thought Ratio Reveal the True Reasoning Power of LLMs?

Instant Consumer Technology Team

Sep 28, 2025 · Artificial Intelligence

Why Chinese AI Agents Lead at Home but Lag Abroad – Key Findings from the 2025 Enterprise AI Agent Report

The 2025 Enterprise AI Agent Research Report reveals that domestic Chinese agents excel in localized tasks and data precision, while international agents dominate in generalization, speed, and iterative efficiency, highlighting six critical adoption metrics and showcasing diverse industry case studies that illustrate the current AI Agent landscape and future opportunities.

AI adoptionAI agentsAI case studies

0 likes · 20 min read

Why Chinese AI Agents Lead at Home but Lag Abroad – Key Findings from the 2025 Enterprise AI Agent Report

360 Zhihui Cloud Developer

Jul 23, 2025 · Artificial Intelligence

How to Leverage TLM Platform for Comprehensive Large Model Evaluation

This guide explains how to use the TianJi Large Model (TLM) platform to create evaluation tasks, choose effectiveness or performance modes, work with built‑in datasets, interpret detailed reports, and understand the underlying metrics and judge‑model techniques for large‑model assessment.

AI metricsDatasetsPerformance Testing

0 likes · 9 min read

How to Leverage TLM Platform for Comprehensive Large Model Evaluation

Baobao Algorithm Notes

Jun 27, 2024 · Industry Insights

How Open LLM Leaderboard v2 Redefines LLM Evaluation with New Benchmarks and Fair Scoring

Open LLM Leaderboard v2 introduces a revamped, reproducible evaluation framework for large language models, replacing saturated benchmarks with six carefully curated, unpolluted datasets, applying standardized scoring, updating the harness, adding voting and maintainer‑recommended models, and providing richer visualizations to guide the AI community.

AI metricsLLM evaluationOpen LLM Leaderboard

0 likes · 19 min read

How Open LLM Leaderboard v2 Redefines LLM Evaluation with New Benchmarks and Fair Scoring

Alibaba Cloud Big Data AI Platform

Jun 19, 2024 · Artificial Intelligence

How to Conduct Platform‑Based Large Model Evaluation with PAI

This guide explains how to use Alibaba Cloud PAI to prepare datasets, select open‑source or fine‑tuned models, create evaluation tasks, configure resources, view detailed metrics such as ROUGE and BLEU, and compare results across multiple models for both custom and public datasets.

AI metricsPAIcustom-dataset

0 likes · 14 min read

How to Conduct Platform‑Based Large Model Evaluation with PAI

dbaplus Community

Jun 18, 2024 · Artificial Intelligence

How to Effectively Evaluate RAG Systems: Metrics, Tools, and Best Practices

Evaluating Retrieval‑Augmented Generation (RAG) systems requires both component‑level and end‑to‑end metrics—such as context relevance, recall, answer relevance, and groundedness—and can be automated with tools like TruLens, RAGAS, LangSmith, and Langfuse, enabling systematic selection and optimization of LLM applications.

AI metricsLLMLangSmith

0 likes · 8 min read

How to Effectively Evaluate RAG Systems: Metrics, Tools, and Best Practices