Tagged articles

Evaluation

213 articles · Page 2 of 3
Open Source Tech Hub
Open Source Tech Hub
Oct 23, 2025 · Backend Development

Boost PHP Performance with CEL-PHP: A Fast, Safe Expression Engine

This guide introduces CEL-PHP, a high‑performance, non‑Turing‑complete expression engine for PHP 8+, showing how to install it, evaluate simple and contextual expressions, handle parsing and optimization, integrate caching, register custom functions, and avoid common pitfalls for robust backend rule evaluation.

CELCachingEvaluation
0 likes · 8 min read
Boost PHP Performance with CEL-PHP: A Fast, Safe Expression Engine
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Oct 20, 2025 · Artificial Intelligence

nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation

This article revisits nanochat's core components, detailing the preparation of diverse training datasets, the scaling calculations for tokens and parameters, the model's MQA and KV‑cache design, the full training pipeline with gradient accumulation and mixed‑precision, cost breakdown, inference optimizations, evaluation tasks, and identified limitations with suggested improvements.

EvaluationKV cacheLLM
0 likes · 9 min read
nanochat Source Code Deep Dive: Data Prep, Model Design, Training & Evaluation
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 15, 2025 · Artificial Intelligence

Mastering Structured Output in Large Language Models: Techniques, Challenges, and Future Trends

Large language models are evolving from free‑form text generators to reliable data providers by mastering structured output through prompt engineering, validation frameworks, constrained decoding, supervised fine‑tuning, reinforcement learning, and API‑level capabilities, enabling seamless integration with software systems while addressing hallucinations and format reliability.

APIConstrained DecodingEvaluation
0 likes · 28 min read
Mastering Structured Output in Large Language Models: Techniques, Challenges, and Future Trends
HyperAI Super Neural
HyperAI Super Neural
Oct 14, 2025 · Artificial Intelligence

NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass

OCRBench v2, introduced at NeurIPS 2025, evaluates 58 multimodal models on 23 OCR‑related tasks in Chinese and English, revealing that even top models like Gemini‑2.5‑Pro barely exceed the passing threshold and that most models struggle with fine‑grained text localization and multilingual performance.

EvaluationGeminiNeurIPS 2025
0 likes · 8 min read
NeurIPS 2025: OCRBench v2 Shows Gemini Leads Chinese OCR Ranking Yet Scores Only Pass
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Oct 13, 2025 · Operations

How to Build a Fail‑Proof Procurement Process with Data‑Driven SRM

This article explains why many procurement processes fail despite formal procedures and provides a step‑by‑step, data‑driven approach—clarifying requirements, using SRM templates, screening suppliers with performance data, scoring comprehensively, ensuring traceability, and conducting post‑award reviews—to select the right suppliers and turn procurement into a strategic advantage.

Data-DrivenEvaluationSRM
0 likes · 8 min read
How to Build a Fail‑Proof Procurement Process with Data‑Driven SRM
Data Thinking Notes
Data Thinking Notes
Sep 10, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Uncovering the Statistical Roots

OpenAI’s latest research reveals that language model hallucinations stem from training and evaluation incentives that favor confident guesses over acknowledging uncertainty, and proposes revised scoring methods that reward modesty, highlighting statistical mechanisms behind false answers and offering pathways to reduce hallucinations.

AI safetyEvaluationHallucination
0 likes · 10 min read
Why Do Language Models Hallucinate? Uncovering the Statistical Roots
Architect
Architect
Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Insights from OpenAI’s New Study

This article explains why large language models often produce confident but incorrect answers, detailing statistical inevitability, data scarcity, and model capacity limits, and proposes concrete solutions such as confidence thresholds and allowing abstention to reduce hallucinations.

AI safetyEvaluationHallucination
0 likes · 8 min read
Why Do Language Models Hallucinate? Insights from OpenAI’s New Study
DataFunSummit
DataFunSummit
Aug 23, 2025 · Artificial Intelligence

Mastering Role‑Playing AI Agents: Challenges, Techniques, and Future Directions

This article surveys the latest research on role‑playing AI agents, covering their definition, core components, application scenarios, three main challenges—role fidelity, long‑term memory, and evaluation—and presents four technical approaches for each challenge along with future research directions and references.

AI AgentsEvaluationPrompt Engineering
0 likes · 22 min read
Mastering Role‑Playing AI Agents: Challenges, Techniques, and Future Directions
Data Party THU
Data Party THU
Aug 23, 2025 · Artificial Intelligence

How MiroMind‑M1 Sets New Benchmarks in Open‑Source Math Reasoning

The article presents MiroMind‑M1, an open‑source math‑reasoning language model that combines a 719K high‑quality SFT dataset, a novel CAMPO reinforcement‑learning algorithm, and extensive evaluations on AIME24, AIME25, and MATH‑500, demonstrating state‑of‑the‑art performance while reducing token usage.

CAMPOEvaluationmath reasoning
0 likes · 11 min read
How MiroMind‑M1 Sets New Benchmarks in Open‑Source Math Reasoning
JD Tech Talk
JD Tech Talk
Jul 27, 2025 · Artificial Intelligence

Evaluating JoyAgent‑JDGenie: A Lightweight Multi‑Agent AI Framework in Action

This article presents a thorough evaluation of the open‑source JoyAgent‑JDGenie multi‑agent AI framework, covering its background, test cases for restaurant recommendation and travel planning, deployment steps, performance metrics, and concluding recommendations, highlighting its efficiency, ease of deployment, and result quality.

AIAgentsEvaluation
0 likes · 8 min read
Evaluating JoyAgent‑JDGenie: A Lightweight Multi‑Agent AI Framework in Action
Zhihu Tech Column
Zhihu Tech Column
Jul 25, 2025 · Artificial Intelligence

Boost Creative Writing with Zhi-Create-Qwen3-32B: Training, Eval & Deployment

This article introduces the open‑source Zhi‑Create‑Qwen3‑32B model, detailing its fine‑tuned training on creative‑writing data, the multi‑domain dataset strategy, curriculum‑learning based SFT, evaluation on WritingBench, and practical deployment options across various hardware and inference frameworks.

EvaluationLarge Language Modelcreative writing
0 likes · 11 min read
Boost Creative Writing with Zhi-Create-Qwen3-32B: Training, Eval & Deployment
ELab Team
ELab Team
Jul 9, 2025 · Artificial Intelligence

How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding

This article explains the design of the edit_file tool, the fast‑apply model that rewrites whole files instead of diffs, its training and evaluation methodology, speculative decoding speed gains, and future research directions for large‑scale code‑editing AI systems.

AIEvaluationModel Training
0 likes · 14 min read
How Fast‑Apply AI Models Revolutionize Code Editing with Speculative Decoding
DataFunTalk
DataFunTalk
Jul 3, 2025 · Artificial Intelligence

How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations

In an interview with Vivo AI engineer Liang Tianan, the article explores the challenges of post‑Q&A recommendation, the integration of large language models into recall, ranking and evaluation pipelines, and the engineering trade‑offs required to deliver high‑quality, diverse suggestions on mobile devices.

EvaluationLLMMultimodal
0 likes · 15 min read
How Vivo’s Blue Heart XiaoV Leverages LLMs to Transform Conversational Recommendations
DataFunSummit
DataFunSummit
Jun 19, 2025 · Artificial Intelligence

How Large Models Are Revolutionizing Douyin’s User Experience – Expert Insights

In a detailed interview, ByteDance AI specialist Cai Conghuai explains how large‑model techniques such as SFT, DPO and RAG address Douyin’s multimodal user‑experience challenges, improve signal detection, root‑cause analysis, and outline future AI‑agent breakthroughs for content platforms.

AI AlgorithmsEvaluationMultimodal Learning
0 likes · 11 min read
How Large Models Are Revolutionizing Douyin’s User Experience – Expert Insights
Aikesheng Open Source Community
Aikesheng Open Source Community
Jun 17, 2025 · Artificial Intelligence

Introducing SCALE: An Open‑Source Benchmark Redefining LLM SQL Capabilities

This article presents SCALE, a community‑driven, open‑source benchmark that expands beyond simple Text‑to‑SQL accuracy to evaluate large language models on performance, dialect conversion, and deep SQL understanding, offering developers, researchers, and CTOs a realistic measure of AI‑assisted database tasks.

AIEvaluationLLM
0 likes · 10 min read
Introducing SCALE: An Open‑Source Benchmark Redefining LLM SQL Capabilities
Tencent Technical Engineering
Tencent Technical Engineering
Jun 16, 2025 · Artificial Intelligence

Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies

This comprehensive guide walks you through the fundamentals of Retrieval‑Augmented Generation (RAG) and AI agents, explains their inner workings, shares optimization tricks, provides ready‑to‑run code snippets, and demonstrates how to evaluate performance with metrics such as recall, faithfulness, and answer relevance.

AI AgentsEvaluationLLM
0 likes · 36 min read
Mastering RAG and AI Agents: Practical Tips, Code Samples, and Evaluation Strategies
Model Perspective
Model Perspective
May 25, 2025 · Fundamentals

Why We Pretend to Win: The Hidden Math Behind Evaluation Bias

The article explores how people manipulate evaluation systems by redefining variables, adjusting weights, and shifting perspectives, turning losses into perceived wins, and reveals the psychological and statistical biases that create this illusion, urging more honest, multi‑dimensional, transparent modeling for genuine assessment.

BiasEvaluationdecision-making
0 likes · 9 min read
Why We Pretend to Win: The Hidden Math Behind Evaluation Bias
DataFunSummit
DataFunSummit
May 9, 2025 · Artificial Intelligence

Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product

This article presents a comprehensive overview of Zhihu Direct Answer, describing its AI‑driven search architecture, RAG framework, query understanding, retrieval, chunking, reranking, generation, evaluation mechanisms, engineering optimizations, and the professional edition, while sharing concrete performance‑boosting practices and future development plans.

AIEvaluationProduct Development
0 likes · 14 min read
Practical Experience Building Zhihu Direct Answer: An AI‑Powered Search Product
Architect
Architect
Apr 17, 2025 · Artificial Intelligence

The Second Half of AI: From Model Innovation to Real‑World Utility

The article argues that artificial intelligence has entered a new phase where reinforcement learning finally generalizes, evaluation becomes more important than pure model performance, and researchers must redesign benchmarks and utility‑focused tasks to drive truly transformative progress.

Evaluationresearch strategy
0 likes · 16 min read
The Second Half of AI: From Model Innovation to Real‑World Utility
Nightwalker Tech
Nightwalker Tech
Apr 1, 2025 · Artificial Intelligence

Evaluation of AutoGLM: Features, Architecture, and Practical Test Results

This article reviews AutoGLM, the first "think‑while‑doing" AI agent released by Zhipu AI, detailing its core capabilities, full‑stack architecture, user experience, identified limitations, and the outcomes of three hands‑on tests using both the client application and a Chrome extension.

AI AgentAutoGLMEvaluation
0 likes · 4 min read
Evaluation of AutoGLM: Features, Architecture, and Practical Test Results
Meituan Technology Team
Meituan Technology Team
Mar 27, 2025 · Artificial Intelligence

Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation

The Q‑Eval‑100K dataset, comprising 100 k AIGC images and videos with separate visual‑quality and textual‑consistency annotations, powers the open‑source Q‑Eval‑Score framework that fine‑tunes multimodal models to deliver state‑of‑the‑art, scalable, and objective evaluation—including a “vague‑to‑specific” strategy for long prompts—surpassing existing benchmarks.

AIGCEvaluationMultimodal
0 likes · 9 min read
Q-Eval-100K Dataset and Q-Eval-Score Evaluation Framework for Text-to-Visual Generation
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 24, 2025 · Artificial Intelligence

Why LLM Internet Search Fails and How to Fix It: A Deep Dive into Qwen, Doubao, and DeepSeek

This article analyses the shortcomings of large‑model internet search—such as unverifiable sources, fabricated content, and poor instruction compliance—by comparing Qwen‑max, Doubao‑1.5‑pro‑256k, and DeepSeek‑v3, and proposes prompt engineering, post‑processing, and custom tool improvements to boost reliability.

AIEvaluationLLM
0 likes · 22 min read
Why LLM Internet Search Fails and How to Fix It: A Deep Dive into Qwen, Doubao, and DeepSeek
DaTaobao Tech
DaTaobao Tech
Mar 19, 2025 · Artificial Intelligence

Retrieval Augmented Generation (RAG): Principles, Challenges, and Implementation Techniques

Retrieval‑augmented generation (RAG) enhances large language models by integrating a preprocessing pipeline—cleaning, chunking, embedding, and vector storage—with a query‑driven retrieval and prompt‑injection workflow, leveraging vector databases, multi‑stage recall, advanced prompting, and comprehensive evaluation metrics to mitigate knowledge cut‑off, hallucinations, and security issues.

EvaluationLLMRAG
0 likes · 27 min read
Retrieval Augmented Generation (RAG): Principles, Challenges, and Implementation Techniques
Efficient Ops
Efficient Ops
Mar 12, 2025 · Operations

How BizDevOps Is Accelerating Digital Transformation in Finance

This article explains the governmental push for digital transformation in financial institutions, introduces the BizDevOps integration model and its domestic and international standards, outlines the evaluation framework and process, showcases case studies, and announces the open registration for the 2025 BizDevOps assessment.

BizDevOpsEvaluationFinancial Industry
0 likes · 9 min read
How BizDevOps Is Accelerating Digital Transformation in Finance
AI Algorithm Path
AI Algorithm Path
Feb 20, 2025 · Artificial Intelligence

What Is Perplexity in Large Language Models?

The article explains perplexity as a metric for evaluating large language models, walks through a step‑by‑step probability calculation for a sample sentence, shows how to normalize by sentence length using the geometric mean, and demonstrates that lower perplexity indicates a more accurate and less uncertain model.

AIEvaluationLanguage Model
0 likes · 6 min read
What Is Perplexity in Large Language Models?
JD Retail Technology
JD Retail Technology
Feb 10, 2025 · Artificial Intelligence

JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration

The JD Merchant Intelligent Assistant employs a large‑language‑model‑driven multi‑agent architecture with dynamic ReAct planning, enabling merchants to query and execute store operations in under a second with over 90 % decision accuracy, while reducing inference cost, hallucinations, and engineering effort across diverse e‑commerce tasks.

AIEvaluationLLM
0 likes · 25 min read
JD Merchant Intelligent Assistant: Multi‑Agent Architecture and Technical Exploration
DataFunSummit
DataFunSummit
Jan 25, 2025 · Artificial Intelligence

AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook

This article explores how AI transforms next‑generation sales by detailing project background and goals, core technologies such as efficient sample generation, model training and evaluation, system deployment impact, practical case studies, challenges, solutions, and future directions across multiple industries.

AIEvaluationLarge Language Model
0 likes · 25 min read
AI-Driven Next-Generation Sales: Project Overview, Core Technologies, System Deployment, and Future Outlook
Zhihu Tech Column
Zhihu Tech Column
Jan 17, 2025 · Artificial Intelligence

Zhihu Direct Answer: Product Overview and Technical Practices

This article summarizes the key technical insights from Zhihu Direct Answer, an AI-powered search product, covering its product overview, RAG framework, query understanding, retrieval strategies, chunking, reranking, generation techniques, evaluation methods, and engineering optimizations for cost and performance.

AI SearchChunkingEngineering Optimization
0 likes · 13 min read
Zhihu Direct Answer: Product Overview and Technical Practices
NewBeeNLP
NewBeeNLP
Jan 17, 2025 · Artificial Intelligence

Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction

This comprehensive survey examines the foundations, tokenization techniques, model architectures, training paradigms, evaluation benchmarks, and open challenges of multimodal next‑token prediction (MMNTP), offering researchers a clear roadmap for future advances in multimodal AI.

EvaluationMultimodal AINext Token Prediction
0 likes · 9 min read
Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction
Data Thinking Notes
Data Thinking Notes
Jan 7, 2025 · Databases

Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques

This article provides a comprehensive overview of LLM-based Text-to-SQL technology, covering its background, evolution, challenges, various LLM-driven methods, benchmark datasets, evaluation metrics, and future research directions to guide researchers and practitioners in advancing natural language interfaces for databases.

EvaluationLLMPrompt Engineering
0 likes · 18 min read
Unlocking LLM-Powered Text-to-SQL: From Basics to Cutting-Edge Techniques
DataFunSummit
DataFunSummit
Jan 1, 2025 · Artificial Intelligence

Challenges and Evaluation Strategies for LLM Agents in 2024

The article outlines the rapid progress of LLM agents in 2024 while highlighting key difficulties in planning capabilities, evaluation methods, dataset generation, and metric design, and suggests practical combinations and product‑level enhancements to improve efficiency, accuracy, and usability.

AIAgentEvaluation
0 likes · 3 min read
Challenges and Evaluation Strategies for LLM Agents in 2024
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 12, 2024 · Artificial Intelligence

How PertEval Reveals the Real Knowledge Limits of Large Language Models

At NeurIPS 2024, Alibaba Cloud's PAI team presented the Spotlight paper PertEval, which introduces knowledge‑invariant perturbations to expose the true knowledge capacity of LLMs, critiques over‑optimistic static benchmarks, and showcases responsible AI solutions and platform demos for enterprise use.

Alibaba CloudEvaluationNeurIPS 2024
0 likes · 6 min read
How PertEval Reveals the Real Knowledge Limits of Large Language Models
DevOps
DevOps
Nov 21, 2024 · Product Management

Comprehensive Product KPI Metrics and Evaluation Guidelines

This article presents a detailed collection of product key performance indicators (KPIs) covering user growth, retention, activity, satisfaction, market share, revenue, development cycles, resource utilization, team satisfaction, brand awareness, and strategic goal achievement, along with formulas, weighting, and scoring methods for systematic performance assessment.

EvaluationKPIsperformance metrics
0 likes · 13 min read
Comprehensive Product KPI Metrics and Evaluation Guidelines
Fighter's World
Fighter's World
Nov 18, 2024 · Product Management

Uncovering AI Product Design Challenges: Insights from OpenAI and Anthropic CPOs

The article distills a fireside chat between OpenAI’s CPO Kevin Weil and Anthropic’s CPO Mike Krieger, highlighting how uncertainty, iterative co‑design, evolving product‑manager skills, human‑AI collaboration, non‑deterministic UI, and emerging trends like proactivity, asynchrony, multimodality and personalization shape modern AI product development.

AI product designEvaluationHuman-AI Collaboration
0 likes · 13 min read
Uncovering AI Product Design Challenges: Insights from OpenAI and Anthropic CPOs
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 4, 2024 · Artificial Intelligence

Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations

A user study with 21 participants reveals sixteen critical limitations of generative AI search engines, maps them to eight quantitative metrics, proposes sixteen design recommendations, and evaluates You.com, Perplexity and BingChat against this framework to highlight current performance gaps.

AI SearchEvaluationLLM
0 likes · 12 min read
Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 17, 2024 · Artificial Intelligence

How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights

Meta’s newly released 92‑page Movie Gen paper introduces a multimodal LLM that unifies text‑to‑image, text‑to‑video, personalized video, precise video editing, and audio generation, detailing its dual‑model architecture, training pipeline, temporal auto‑encoder design, scaling strategies, evaluation benchmark, and ablation studies.

EvaluationModel Scalingdeep learning
0 likes · 34 min read
How Meta’s Movie Gen Pushes Text‑to‑Video Generation to New Heights
Bilibili Tech
Bilibili Tech
Sep 18, 2024 · Artificial Intelligence

Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities

Index-1.9B-32K is a 1.9B-parameter model with a 32K token context window, achieving strong long‑text performance comparable to larger models while using only about 2% of GPT‑4’s compute, trained via long pre‑training and supervised fine‑tuning, with a trade‑off of reduced short‑context ability.

AIEvaluationLong Context
0 likes · 12 min read
Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities
Architect
Architect
Jul 13, 2024 · Artificial Intelligence

Practical Guide to Building LLM Products: Prompt Engineering, RAG, Evaluation, and Operations

This article provides a comprehensive, step‑by‑step guide for developing large‑language‑model (LLM) applications, covering prompt design techniques, n‑shot and chain‑of‑thought strategies, retrieval‑augmented generation, structured I/O, workflow optimization, evaluation pipelines, operational best practices, and team organization to create reliable, scalable AI products.

AI OperationsEvaluationLLM
0 likes · 54 min read
Practical Guide to Building LLM Products: Prompt Engineering, RAG, Evaluation, and Operations
DataFunTalk
DataFunTalk
Jul 7, 2024 · Artificial Intelligence

Large Model Application Development: Architecture, Lifecycle, and Prompt Engineering

This article presents a comprehensive knowledge map for developing large‑model applications, covering a four‑layer technical architecture, the full development lifecycle, core elements such as prompt engineering and model fine‑tuning, evaluation methods, and practical case studies, offering guidance for both enterprises and startups.

AI application developmentEvaluationPrompt Engineering
0 likes · 15 min read
Large Model Application Development: Architecture, Lifecycle, and Prompt Engineering
AI Large Model Application Practice
AI Large Model Application Practice
Jul 4, 2024 · Artificial Intelligence

Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting

This article explains how to handle complex multimodal PDFs in RAG systems, outlines extraction, indexing, and multimodal model integration, details four query‑rewriting strategies (HyDE, stepwise, sub‑question, backward), and presents key evaluation metrics and tools for assessing RAG performance.

Document ParsingEvaluationMultimodal
0 likes · 12 min read
Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting
Continuous Delivery 2.0
Continuous Delivery 2.0
Jul 3, 2024 · Artificial Intelligence

Applying Large Language Models to Software Engineering: Challenges, Cross‑File Editing Issues, Bug‑Fixing Evaluation, and SWE‑Bench Results

This article examines the practical challenges of using large language models in software development, including handling long contexts, cross‑file editing, bug‑fixing evaluation methods, and presents benchmark results from SWE‑Bench and its Lite subset to assess model capabilities.

Cross-File EditingEvaluationLLM
0 likes · 7 min read
Applying Large Language Models to Software Engineering: Challenges, Cross‑File Editing Issues, Bug‑Fixing Evaluation, and SWE‑Bench Results
DataFunSummit
DataFunSummit
Jun 16, 2024 · Artificial Intelligence

Reinforcement Learning in Recommendation Systems: Practice, Challenges, and Industry Advances

This article presents a comprehensive overview of applying reinforcement learning to recommendation systems, covering background challenges, practical exploration, frontier research directions, multi‑agent and inverse RL approaches, evaluation methods, and future outlooks, based on a KDD‑published study and industry experience.

EvaluationInverse RLOffline RL
0 likes · 24 min read
Reinforcement Learning in Recommendation Systems: Practice, Challenges, and Industry Advances
Bilibili Tech
Bilibili Tech
Jun 14, 2024 · Artificial Intelligence

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

The report presents the open‑source Index‑1.9B family—base, pure, chat, and character variants—detailing benchmark results, pre‑training optimizations such as a normalized LM‑Head and deeper‑slim architectures, the importance of modest instruction data, alignment via SFT/DPO, role‑play enhancements with RAG, and acknowledges remaining safety and factual limitations.

EvaluationInstruction TuningLLM
0 likes · 15 min read
Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments
DataFunSummit
DataFunSummit
Jun 10, 2024 · Artificial Intelligence

Xiaomi Agent Technology: Architecture, Prompt Management, and Evaluation

This article presents Xiaomi's work on LLM‑based Agent technology, covering its perception‑thinking‑action pipeline, technical framework, prompt management, executor and API platform, workflow, optimization strategies, evaluation metrics, and future directions for AI assistants.

AI assistantAgentEvaluation
0 likes · 17 min read
Xiaomi Agent Technology: Architecture, Prompt Management, and Evaluation
DevOps
DevOps
May 23, 2024 · Information Security

Guidelines for Evaluating Large Language Models in Cybersecurity Tasks

The article examines the opportunities and risks of applying large language models (LLMs) to cybersecurity, outlines fourteen practical recommendations for assessing their real‑world capabilities, and concludes with an invitation to the upcoming R&D Efficiency Conference covering AI, product management, and related topics.

AI safetyEvaluationLLM
0 likes · 11 min read
Guidelines for Evaluating Large Language Models in Cybersecurity Tasks
NewBeeNLP
NewBeeNLP
May 18, 2024 · Artificial Intelligence

How to Detect Test Set Contamination in Black‑Box Language Models

Researchers propose a black‑box method to expose test‑set leakage in large language models by comparing log‑probability shifts when test items are shuffled, using Monte‑Carlo estimation and a sharded likelihood test, and demonstrate its effectiveness on several models including Mistral‑7B.

EvaluationLLMblack-box detection
0 likes · 8 min read
How to Detect Test Set Contamination in Black‑Box Language Models
AI Large Model Application Practice
AI Large Model Application Practice
May 3, 2024 · Artificial Intelligence

Can Giant Context LLMs Replace RAG? Exploring the Limits of Long‑Context Retrieval

This article examines whether the rapid growth of large‑language‑model context windows can eliminate the need for retrieval‑augmented generation, presenting experimental needle‑in‑a‑haystack tests, analysis of model performance across token lengths and needle positions, and practical guidance using an open‑source evaluation tool.

AIEvaluationLLM
0 likes · 13 min read
Can Giant Context LLMs Replace RAG? Exploring the Limits of Long‑Context Retrieval
360 Tech Engineering
360 Tech Engineering
Apr 17, 2024 · Artificial Intelligence

HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation

The 360 AI Research Institute introduces HiCo, a hierarchical controllable diffusion model that enables fine‑grained layout control across up to eight image regions, integrates seamlessly with existing Stable Diffusion ecosystems, and demonstrates superior performance on the GRIT‑VAL benchmark for layout‑aware image synthesis.

AI drawingEvaluationHiCo
0 likes · 8 min read
HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation
Tech Architecture Stories
Tech Architecture Stories
Jan 29, 2024 · R&D Management

Mastering Tech Promotion Reviews: Proven Strategies to Accelerate Your Career

This guide shares years of promotion‑review experience from major tech firms, outlining company‑specific promotion processes and five essential content elements—systematic design, detailed data, derivation reasoning, upstream/downstream context, and comparative analysis—plus practical presentation and logical techniques to help engineers secure promotions and salary raises.

EvaluationR&D Managementcareer advancement
0 likes · 8 min read
Mastering Tech Promotion Reviews: Proven Strategies to Accelerate Your Career
DataFunSummit
DataFunSummit
Jan 14, 2024 · Artificial Intelligence

Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications

This article details how the financial sector is adopting large language models, describing the shift from generic to finance‑specific models, the technical challenges and cost considerations, the XuanYuan model releases, novel training and evaluation approaches, and a range of practical applications such as marketing, service, operations, office assistance, and risk control.

AIApplicationsEvaluation
0 likes · 17 min read
Large Language Model Innovations for the Financial Industry: From General to Finance‑Specific Models, Training Techniques, Evaluation Methods, and Real‑World Applications
DataFunTalk
DataFunTalk
Jan 2, 2024 · Artificial Intelligence

Mid‑Stage Reflections on Large‑Model Technology and Its Industry Impact

This article offers a comprehensive mid‑stage analysis of large‑model technology, discussing its rapid development, emerging challenges such as cost and hallucinations, positioning, scenario applications, cost‑value trade‑offs, and strategic pathways for future research and deployment.

AIApplicationsEvaluation
0 likes · 21 min read
Mid‑Stage Reflections on Large‑Model Technology and Its Industry Impact
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 29, 2023 · Artificial Intelligence

Overview of Major Benchmark Datasets for Evaluating Large Language Models

This article provides a comprehensive overview of major benchmark datasets—including CMMLU, MMLU, C‑Eval, GSM8K, Gaokao‑Bench, AGIEval, MATH, BBH, HumanEval, and MBPP—used to evaluate large language models' knowledge, reasoning, and coding abilities, and summarizes related leaderboards and evaluation tools.

EvaluationLLMartificial-intelligence
0 likes · 14 min read
Overview of Major Benchmark Datasets for Evaluating Large Language Models
Baidu Geek Talk
Baidu Geek Talk
Dec 20, 2023 · Artificial Intelligence

A Unified Platform for Prompt Development, Evaluation, and Iteration in Large Language Model Applications

The proposed unified platform centralizes prompt creation, evaluation, and iteration for large‑model applications, offering one‑stop hosting, metric‑driven testing, seamless resource integration, model switching, fine‑grained traffic control, and an automated data‑flywheel with QEP scoring, cutting optimization cycles from weeks to days while paving the way for advanced fine‑tuning techniques.

AI platformAutomationData Flywheel
0 likes · 17 min read
A Unified Platform for Prompt Development, Evaluation, and Iteration in Large Language Model Applications
AntTech
AntTech
Dec 19, 2023 · Artificial Intelligence

RJUA‑QA: A Comprehensive Urology QA Dataset for Large Language Model Evaluation

RJUA‑QA is a newly released, large‑scale urology question‑answer dataset constructed from virtual patient records based on clinical experience, featuring 2,132 QA pairs with extensive context, designed to benchmark and improve large language models’ medical reasoning, diagnosis, and treatment recommendation capabilities.

EvaluationQA datasetUrology
0 likes · 12 min read
RJUA‑QA: A Comprehensive Urology QA Dataset for Large Language Model Evaluation
JD Cloud Developers
JD Cloud Developers
Nov 28, 2023 · Backend Development

Choosing the Right Java Expression Engine: Performance, Security, and Community Insights

This article provides a comprehensive overview and comparative analysis of popular Java expression engines—including AviatorScript, MVEL, OGNL, SpEL, QLExpress, JEXL, JUEL, and Janino—covering their features, community support, size, performance benchmarks, security settings, usage cases, and syntax differences to guide developers in selecting the most suitable engine for their projects.

EvaluationExpression EngineJava
0 likes · 23 min read
Choosing the Right Java Expression Engine: Performance, Security, and Community Insights
Ant R&D Efficiency
Ant R&D Efficiency
Nov 24, 2023 · Artificial Intelligence

CodeFuseEval: An Enterprise‑Level Multi‑Task Benchmark for Evaluating Code Large Models

CodeFuseEval is an enterprise‑grade, multi‑task benchmark that evaluates code‑generation large models across six languages and thousands of real‑world tasks using both objective metrics (pass@k, BLEU, CodeBLEU) and expert human review, with an open‑source framework, continuous dataset expansion, and a focus on correctness, efficiency, robustness, and service‑level quality.

AIEvaluationbenchmark
0 likes · 12 min read
CodeFuseEval: An Enterprise‑Level Multi‑Task Benchmark for Evaluating Code Large Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 23, 2023 · Artificial Intelligence

Why Multimodal AI Agents Could Be the Next Killer App for Large Models

The article recounts a personal test of a multimodal AI agent in Newport Beach and expands into a detailed analysis of current multimodal LLM architectures, memory mechanisms, task planning, tool usage, personality modeling, cost constraints, evaluation challenges, and the broader social and reliability implications of deploying such agents.

AI AgentsEvaluationMultimodal
0 likes · 44 min read
Why Multimodal AI Agents Could Be the Next Killer App for Large Models
Software Development Quality
Software Development Quality
Oct 19, 2023 · Artificial Intelligence

Beyond ROUGE: GLUE, SuperGLUE, MMLU, C‑Eval & HELM Transform NLP Evaluation

Evaluating language models solely with ROUGE or BLEU is insufficient, so comprehensive benchmarks like GLUE, SuperGLUE, MMLU, C‑Eval, and HELM provide diverse tasks and metrics that more accurately assess linguistic understanding, knowledge acquisition, and robustness across English and Chinese NLP systems.

AIEvaluationLanguage Models
0 likes · 9 min read
Beyond ROUGE: GLUE, SuperGLUE, MMLU, C‑Eval & HELM Transform NLP Evaluation
Architecture and Beyond
Architecture and Beyond
Sep 3, 2023 · R&D Management

Effective Team Management: Definitions, Development Stages, and Best Practices

This article explains what a team is, describes its open‑system nature and three‑layer composition, outlines the Tuckman development model and leadership growth stages, and provides practical guidance on direction, leadership, roles, systems, communication, relationships, and evaluation for managing high‑performing technical teams.

EvaluationLeadershipTeam Development
0 likes · 45 min read
Effective Team Management: Definitions, Development Stages, and Best Practices
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 22, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate? Definitions, Causes, and Mitigation Strategies

This article defines hallucination in LLMs as a failure of faithfulness or factualness, explores data‑level and model‑level origins, reviews reference‑based and reference‑free evaluation metrics, and surveys current research on data‑centric and model‑centric mitigation techniques along with future directions.

EvaluationHallucinationfactuality
0 likes · 16 min read
Why Do Large Language Models Hallucinate? Definitions, Causes, and Mitigation Strategies
DataFunTalk
DataFunTalk
Aug 11, 2023 · Artificial Intelligence

Multimodal Dialogue Large Model mPLUG-Owl: Technology, Applications, and Evaluation

mPLUG-Owl is a modular multimodal dialogue large model from Alibaba DAMO Academy that builds on the mPLUG series, offering advanced image, video, OCR, and multilingual capabilities, with extensive evaluations showing superior performance over MiniGPT‑4, LLaVA, and other multimodal LLMs across various tasks.

EvaluationMultimodal AImPLUG-Owl
0 likes · 17 min read
Multimodal Dialogue Large Model mPLUG-Owl: Technology, Applications, and Evaluation
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 24, 2023 · Artificial Intelligence

Comprehensive Survey of Large Language Models: History, Key Technologies, Resources, and Future Directions

This article provides a detailed overview of large language models (LLMs), tracing their evolution from statistical and neural language models to modern pre‑trained transformers, discussing scaling, training, adaptation, utilization, evaluation methods, available resources, and outlining current challenges and future research directions.

EvaluationModel ScalingPre‑training
0 likes · 26 min read
Comprehensive Survey of Large Language Models: History, Key Technologies, Resources, and Future Directions
DevOps
DevOps
May 19, 2023 · Cloud Computing

Comprehensive Guide to Cloud Migration: Evaluation, Planning, Execution, and Cost Optimization

This article provides a detailed guide to cloud migration, covering evaluation and analysis, pilot projects, assessment strategies, planning and design with cloud services, verification and implementation steps, continuous measurement, and cost optimization through FinOps to ensure successful and secure migration.

Cloud ComputingCloud MigrationEvaluation
0 likes · 10 min read
Comprehensive Guide to Cloud Migration: Evaluation, Planning, Execution, and Cost Optimization
DataFunSummit
DataFunSummit
May 4, 2023 · Artificial Intelligence

LLM Ranking Arena: Elo‑Based Competitive Evaluation of Open‑Source Chatbots

A recent study by the LMSYS organization introduces an Elo‑rated, 1v1 battle arena for large language models, ranking open‑source chatbots like Vicuna, Koala, and ChatGLM, while discussing the limitations of traditional benchmarks and the advantages of crowd‑sourced, scalable evaluation.

AI benchmarkingChatbot ArenaElo rating
0 likes · 7 min read
LLM Ranking Arena: Elo‑Based Competitive Evaluation of Open‑Source Chatbots
Architect
Architect
Apr 9, 2023 · Artificial Intelligence

Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models

This study systematically evaluates ChatGPT and other large language models on their ability to answer commonsense questions, assess their knowledge awareness, and utilize generated knowledge for reasoning, revealing strong QA performance but notable gaps in social and temporal commonsense and in leveraging contextual knowledge.

ChatGPTEvaluationNLP
0 likes · 20 min read
Evaluating the Commonsense Knowledge and Reasoning Capabilities of ChatGPT and Other Large Language Models
Programmer DD
Programmer DD
Apr 9, 2023 · Artificial Intelligence

How Does Alibaba’s Tongyi Qianwen Compare to ChatGPT? A Hands‑On Evaluation

This article reviews Alibaba’s Tongyi Qianwen large‑language model by testing its self‑introduction, code generation, literary creation, mathematical reasoning, Chinese language understanding, and casual chatting abilities, summarizing strengths, weaknesses, and overall performance compared with other LLMs.

Chinese LanguageEvaluationartificial-intelligence
0 likes · 7 min read
How Does Alibaba’s Tongyi Qianwen Compare to ChatGPT? A Hands‑On Evaluation
DataFunSummit
DataFunSummit
Mar 19, 2023 · Artificial Intelligence

Complex Question Answering Evaluation of ChatGPT

This paper presents a large‑scale evaluation of ChatGPT on knowledge‑base complex question answering, introducing a feature‑driven multi‑label annotation framework and CheckList‑based functional, robustness, and controllability tests, and comparing its performance with other LLMs across multiple English and multilingual datasets.

Chain-of-ThoughtChatGPTComplex QA
0 likes · 25 min read
Complex Question Answering Evaluation of ChatGPT
Model Perspective
Model Perspective
Nov 6, 2022 · Fundamentals

Unlock Objective Decision-Making with the Entropy Weight Method

The Entropy Weight Method (EWM) offers an objective, data‑driven way to calculate indicator weights by measuring information entropy, avoiding subjective bias and improving the reliability of multi‑criteria evaluations across fields such as water quality and resource management.

Evaluationdecision makingentropy weight method
0 likes · 4 min read
Unlock Objective Decision-Making with the Entropy Weight Method
DataFunSummit
DataFunSummit
Sep 23, 2022 · Artificial Intelligence

A Comprehensive Overview of Automatic Text Summarization: Methods, Datasets, Evaluation, and Future Directions

This article surveys automatic text summarization, detailing system classifications, extractive, abstractive and hybrid techniques, notable recent research, multi‑document and cross‑lingual challenges, major datasets, evaluation metrics, and promising future research avenues in the field.

EvaluationNLPabstractive
0 likes · 21 min read
A Comprehensive Overview of Automatic Text Summarization: Methods, Datasets, Evaluation, and Future Directions
DataFunSummit
DataFunSummit
Sep 5, 2022 · Artificial Intelligence

Comprehensive Evaluation of Long‑Audio Speech‑to‑Text Services from Major Cloud Providers

This article presents a systematic, multi‑dimensional benchmark of six leading cloud speech‑recognition platforms—Alibaba Cloud, Tencent Cloud, iFlytek, Baidu Cloud, Huawei Cloud, and Microsoft Azure—using a 22.6‑hour, 81‑file Mandarin dataset, scoring with the CORR metric and SCTK tool, and discusses each provider's workflow, strengths, pitfalls, and cost.

AICloud ServicesEvaluation
0 likes · 15 min read
Comprehensive Evaluation of Long‑Audio Speech‑to‑Text Services from Major Cloud Providers
DataFunSummit
DataFunSummit
Aug 20, 2022 · Information Security

Content Risk Control Industry Overview and Evaluation System

The article reviews the development background of the digital economy‑driven content risk control industry, examines current content moderation technologies and challenges, describes the establishment of a content technology promotion alliance, outlines its research directions and evaluation standards, and includes a Q&A on regulatory collaboration.

EvaluationStandardsartificial-intelligence
0 likes · 16 min read
Content Risk Control Industry Overview and Evaluation System
Model Perspective
Model Perspective
Jul 2, 2022 · Operations

Top Resources for Evaluation & Optimization Models – A Curated Guide

This article compiles and categorizes recent model‑related publications, offering a comprehensive list of evaluation‑model resources—including concepts, preprocessing techniques, weighting methods, and various algorithms—and optimization‑model references covering linear and integer programming, graph theory, network flows, and meta‑heuristics.

EvaluationLinear ProgrammingOperations
0 likes · 4 min read
Top Resources for Evaluation & Optimization Models – A Curated Guide
Architecture and Beyond
Architecture and Beyond
May 1, 2022 · R&D Management

Effective Questioning Techniques for Promotion Review Panels

The article outlines systematic questioning strategies for judges in corporate promotion defenses, detailing how to clarify definitions, probe processes, assess difficulty, evaluate big‑picture thinking, explore methodology, and link technical work to business value, thereby ensuring fair and insightful evaluations.

EvaluationR&D Managementcareer development
0 likes · 13 min read
Effective Questioning Techniques for Promotion Review Panels
DataFunTalk
DataFunTalk
Oct 5, 2021 · Artificial Intelligence

From Technology to Experience: Vivo Machine Translation Deployment Practice

This article presents a comprehensive guide to deploying machine translation at Vivo, covering business analysis, algorithm choices beyond standard NMT, language detection challenges, data collection and cleaning, scientific evaluation methods, and engineering optimizations to deliver a seamless user experience.

AIEvaluationMachine Translation
0 likes · 20 min read
From Technology to Experience: Vivo Machine Translation Deployment Practice
IT Architects Alliance
IT Architects Alliance
Jul 26, 2021 · R&D Management

How to Conduct a Comprehensive Architecture Evaluation: A Step-by-Step Guide

This article outlines a thorough methodology for evaluating software, hardware, and overall system architectures, detailing assessment criteria, a five‑stage evaluation process, quality‑assurance measures, and best‑practice checkpoints to ensure high availability, scalability, security, and cost‑effectiveness of complex engineering projects.

EvaluationSystem Designarchitecture
0 likes · 12 min read
How to Conduct a Comprehensive Architecture Evaluation: A Step-by-Step Guide
Liulishuo Tech Team
Liulishuo Tech Team
Jul 7, 2021 · Frontend Development

Evaluation and Evolution of Mini‑Program Development Frameworks for Frontend Teams

This article reviews the background, key considerations, architectural principles, evolution, performance comparison, and a customized solution for building mini‑programs using frameworks such as WePY, Taro, and UniApp, highlighting cross‑platform support, TypeScript integration, and development experience improvements.

Evaluationframeworkperformance
0 likes · 12 min read
Evaluation and Evolution of Mini‑Program Development Frameworks for Frontend Teams
Efficient Ops
Efficient Ops
Apr 16, 2021 · Operations

How Anxin Securities Achieved Top RPA Maturity: Insights from China’s First RPA Standard Evaluation

Anxin Securities’ RPA Unified Management Platform earned the highest 3+ maturity rating at China’s inaugural RPA standard assessment, showcasing extensive automation across finance, operations, and disaster recovery, while outlining future SmartRPA initiatives and AI‑driven enhancements for digital transformation.

AI integrationEvaluationRPA
0 likes · 10 min read
How Anxin Securities Achieved Top RPA Maturity: Insights from China’s First RPA Standard Evaluation
21CTO
21CTO
Feb 26, 2021 · Artificial Intelligence

Why One Metric Isn't Enough: Multi‑Dimensional Evaluation of Recommendation Systems

The article explains why relying on a single metric like click‑through rate is insufficient for recommendation systems, and outlines a comprehensive, multi‑dimensional evaluation framework that combines business indicators, user behavior metrics, and algorithmic performance measures such as recall, precision, and AUC.

AB testingAIAUC
0 likes · 10 min read
Why One Metric Isn't Enough: Multi‑Dimensional Evaluation of Recommendation Systems
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Feb 26, 2021 · Artificial Intelligence

Inside Toutiao's Transparent Real-Time Recommendation Engine

This article details how Toutiao's senior algorithm architect designs a transparent recommendation system, covering system overview, three-dimensional feature modeling, real-time training pipelines, recall strategies, content analysis, user tagging, evaluation methods, and content safety measures.

Content SafetyEvaluationReal-time Training
0 likes · 17 min read
Inside Toutiao's Transparent Real-Time Recommendation Engine
21CTO
21CTO
Jan 11, 2021 · Artificial Intelligence

How to Build a Recommendation System from Scratch: Key Concepts and Strategies

This article explains the fundamentals of recommendation systems, covering data collection, user and content profiling, system architecture, algorithmic pipelines such as recall, filtering, ranking, and evaluation metrics, while also discussing practical challenges like echo chambers and long‑term user value.

EvaluationRankingalgorithm
0 likes · 16 min read
How to Build a Recommendation System from Scratch: Key Concepts and Strategies
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Nov 27, 2020 · Product Management

How to Build Effective Decision‑Making Products: A Practical Blueprint

This article outlines a comprehensive framework for designing decision‑type products, covering their evolution stages, core elements of model‑data‑strategy, domain modeling techniques, data‑to‑knowledge transformation, business and process value, and a feedback‑driven decision loop with evaluation and simulation.

Business AnalyticsDecision ProductsEvaluation
0 likes · 20 min read
How to Build Effective Decision‑Making Products: A Practical Blueprint
Programmer DD
Programmer DD
Oct 24, 2020 · Cloud Native

Should You Switch to Microservices? Evaluation Tips and Migration Steps

This article examines the fundamentals of monolithic and microservice architectures, outlines the advantages and drawbacks of each, provides criteria for deciding when to adopt microservices, and offers practical guidance on technical, talent, and organizational considerations for a successful migration.

Evaluationarchitecturecloud-native
0 likes · 16 min read
Should You Switch to Microservices? Evaluation Tips and Migration Steps
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Oct 1, 2020 · Cloud Native

When Should You Adopt Microservices? A Practical Evaluation Guide

This article explores the fundamentals of monolithic and microservice architectures, assesses the benefits, costs, and risks of adopting microservices, and provides practical criteria—including business complexity, team size, and technical readiness—to help decide the optimal moment for migration.

EvaluationMicroservicesbackend
0 likes · 16 min read
When Should You Adopt Microservices? A Practical Evaluation Guide
Top Architect
Top Architect
Sep 19, 2020 · Artificial Intelligence

Architecture and Evaluation of Toutiao's Large-Scale Recommendation System

The article details the end‑to‑end architecture of Toutiao's massive recommendation platform, covering system overview, content and user feature extraction, model training, recall strategies, evaluation methodology, and content safety mechanisms, while highlighting practical challenges and engineering solutions.

Content SafetyEvaluationModel Training
0 likes · 18 min read
Architecture and Evaluation of Toutiao's Large-Scale Recommendation System
Sohu Tech Products
Sohu Tech Products
Sep 16, 2020 · Artificial Intelligence

Open-Domain Dialogue Systems: Current State, Challenges, and Future Directions

This article reviews the latest advances in open-domain dialogue systems, covering classification, end‑to‑end generation challenges, knowledge‑controlled generation, automated evaluation, large‑scale latent‑space models such as PLATO, and outlines future research directions for building more coherent and controllable conversational AI.

Dialogue SystemsEvaluationknowledge grounding
0 likes · 14 min read
Open-Domain Dialogue Systems: Current State, Challenges, and Future Directions
Efficient Ops
Efficient Ops
Aug 20, 2020 · Operations

Understanding China’s First DevOps Capability Maturity Model and Evaluation Process

This article introduces China’s inaugural DevOps Capability Maturity Model, outlines its eight-part structure—including system and tool requirements—describes the standardized evaluation methodology, registration details, and provides contact information for organizations seeking certification.

Capability Maturity ModelEvaluationStandardization
0 likes · 6 min read
Understanding China’s First DevOps Capability Maturity Model and Evaluation Process
21CTO
21CTO
Feb 18, 2020 · Artificial Intelligence

Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation

This article details Toutiao’s large‑scale recommendation system, explaining how it models content, user, and environment features, the variety of algorithms and real‑time training pipelines used, feature engineering categories, recall strategies, content analysis, user tagging, evaluation methods, and content‑safety mechanisms.

Content SafetyEvaluationReal-time Training
0 likes · 18 min read
Inside Toutiao’s Real‑Time Recommendation Engine: Architecture, Features, and Evaluation