Tagged articles
145 articles
Page 1 of 2
Old Zhang's AI Learning
Old Zhang's AI Learning
May 20, 2026 · Artificial Intelligence

Qwen 3.7‑Max vs Claude 4.7: 7 In‑Depth Tests Reveal a Smooth, Powerful Model

The author evaluates Alibaba’s newly released Qwen 3.7‑Max across seven rigorous tasks—including reading comprehension, HTML fireworks generation, 3D particle visualizations, PDF‑to‑PPT conversion, Excel data analysis, GitHub trending scraping, and complex video generation—showing it often surpasses GPT‑5.5‑level models and rivals Claude 4.7, especially in long‑duration agent tasks.

AI BenchmarkAgentClaude 4.7
0 likes · 9 min read
Qwen 3.7‑Max vs Claude 4.7: 7 In‑Depth Tests Reveal a Smooth, Powerful Model
Lao Guo's Learning Space
Lao Guo's Learning Space
May 10, 2026 · Industry Insights

Don't Rush to Buy GPUs: 5 Truths About Deploying Enterprise Large Models

The article reveals five hard‑won truths for enterprises adopting large AI models, showing why buying GPUs first often stalls projects and outlining how to define business goals, start with API‑based pilots, run small‑scale trials, invest in data pipelines, and build robust evaluation frameworks.

API pilotEnterprise AIGPU procurement
0 likes · 9 min read
Don't Rush to Buy GPUs: 5 Truths About Deploying Enterprise Large Models
Old Zhang's AI Learning
Old Zhang's AI Learning
May 6, 2026 · Artificial Intelligence

GPT-5.5 Instant Arrives: Smarter, Clearer, More Personalized AI

OpenAI has silently replaced the default ChatGPT model with GPT‑5.5 Instant, delivering a 52.5% drop in hallucinations, 30% shorter responses, deeper personalization via memory sources, and higher benchmark scores across a range of professional tasks, while rolling out new pricing and usage tiers.

AI benchmarksChatGPTGPT-5.5
0 likes · 11 min read
GPT-5.5 Instant Arrives: Smarter, Clearer, More Personalized AI
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

Why More GPUs and Data Aren’t Enough: Defining Scenarios and Data for Speech Model Training

The article argues that successful speech model training starts with understanding user scenarios, then selecting appropriate data, and finally choosing metrics, detailing six key questions, data sourcing strategies, evaluation criteria, and compliance considerations to avoid the misconception that sheer data volume guarantees performance.

AI trainingModel Evaluationdata collection
0 likes · 6 min read
Why More GPUs and Data Aren’t Enough: Defining Scenarios and Data for Speech Model Training
Woodpecker Software Testing
Woodpecker Software Testing
Apr 24, 2026 · Artificial Intelligence

Transforming Testing Teams for Large Language Models: A Practical Guide

The article explains why traditional deterministic testing fails for LLMs, introduces the ‘trust triangle’ quality model, describes data‑centric and lifecycle‑shifted testing practices, and outlines organizational structures—embedded test scientists or central evaluation centers—that enable reliable, safe AI deployment.

AI trustworthinessAdversarial EvaluationLLM testing
0 likes · 7 min read
Transforming Testing Teams for Large Language Models: A Practical Guide
SuanNi
SuanNi
Apr 16, 2026 · Artificial Intelligence

Claude Opus 4.7 Unleashed: How Anthropic’s New Model Automates Complex Tasks

Anthropic’s latest Claude Opus 4.7 model introduces autonomous task execution via Routines, enhanced code review with /ultrareview, higher-resolution visual input, and significant performance gains across knowledge work, vision, and long‑context reasoning, while adding safety guardrails, a new xhigh compute tier, and unchanged pricing.

AI automationAnthropicClaude Opus
0 likes · 6 min read
Claude Opus 4.7 Unleashed: How Anthropic’s New Model Automates Complex Tasks
Woodpecker Software Testing
Woodpecker Software Testing
Apr 10, 2026 · Artificial Intelligence

2026 Model Evaluation Reaches the Cost‑Benefit Threshold

In 2026, model evaluation has become the pivotal bottleneck in AI engineering, with exploding compute, data‑compliance, and tooling costs forcing a shift from labor‑intensive testing to quantifiable business value, and three levers—dynamic granularity, synthetic data loops, and evaluation‑as‑a‑service—offering a path to a cost‑benefit inflection point.

AI complianceDynamic GranularityEvaluation as a Service
0 likes · 7 min read
2026 Model Evaluation Reaches the Cost‑Benefit Threshold
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 9, 2026 · Artificial Intelligence

How Data Flywheels Accelerate Small Agentic Model Training

This article details a data‑flywheel framework for training compact agentic language models, describing synthetic task generation, mock environment simulation, rubric‑based reward design, iterative hard‑sample augmentation, and experimental results that show consistent performance gains across benchmarks.

Model EvaluationSynthetic Environmentsagentic models
0 likes · 17 min read
How Data Flywheels Accelerate Small Agentic Model Training
SuanNi
SuanNi
Apr 8, 2026 · Industry Insights

How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings

An anonymous model, HappyHorse‑1.0, quickly topped the Artificial Analysis leaderboard for both text‑to‑video and image‑to‑video tracks, outscoring Seedance 2.0 by large margins and prompting intense community discussion about its origin, performance, and future stability.

AICompetitive analysisModel Evaluation
0 likes · 5 min read
How HappyHorse‑1.0 Surpassed Seedance 2.0 in AI Video Generation Rankings
Woodpecker Software Testing
Woodpecker Software Testing
Apr 3, 2026 · Artificial Intelligence

Why 80% of AI Projects Fail: Bridging Model Evaluation from Theory to Real‑World Impact

The article explains that most AI project failures stem from unrealistic evaluation rather than model intelligence, and outlines concrete practices—business‑aligned metrics, scenario sandboxes, human‑in‑the‑loop reviews, and auditable documentation—to make model evaluation truly actionable.

AI deploymentAI reliabilityMLOps
0 likes · 7 min read
Why 80% of AI Projects Fail: Bridging Model Evaluation from Theory to Real‑World Impact
Su San Talks Tech
Su San Talks Tech
Apr 2, 2026 · Artificial Intelligence

How GLM-5.1 Beats Its Predecessor: A Hands‑On Test and Deep Dive

The article presents a detailed, hands‑on evaluation of the newly released GLM‑5.1 model, describing the rollout strategy, step‑by‑step testing on complex coding tasks, configuration details, observed performance improvements over previous versions, and practical guidance for developers seeking to leverage the model for real‑world projects.

AI coding assistantGLM-5.1Model Evaluation
0 likes · 9 min read
How GLM-5.1 Beats Its Predecessor: A Hands‑On Test and Deep Dive
PaperAgent
PaperAgent
Apr 1, 2026 · Artificial Intelligence

How Meta‑Harness Revolutionizes LLM Harness Optimization with 10× Search Speed

Meta‑Harness introduces an external‑loop optimization framework that lets coding agents automatically search and improve large‑language‑model harnesses, achieving up to ten‑fold faster search, ten‑times token efficiency, and significant performance gains across text classification, math reasoning, and agentic coding tasks.

LLMMeta-HarnessModel Evaluation
0 likes · 11 min read
How Meta‑Harness Revolutionizes LLM Harness Optimization with 10× Search Speed
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 28, 2026 · Artificial Intelligence

Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal

Using the open‑source ToolCall‑15 benchmark, the author shows that the 27‑billion‑parameter Qwen3.5 model consistently scores full marks while the 397‑billion‑parameter version fails on several tasks, and that the Q6 quantized variant offers the best trade‑off between size and tool‑calling accuracy.

AILLM BenchmarkModel Evaluation
0 likes · 7 min read
Qwen3.5-27B Outperforms the 397B Model in Tool Calling – Q6 Quantization Is Optimal
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 28, 2026 · Artificial Intelligence

Junyang Lin’s 10k‑Word Review: From Reasoning to Agentic Thinking in Large Models

In a detailed post‑departure analysis, Junyang Lin reviews two years of large‑model evolution, explains how o1 and DeepSeek‑R1 highlighted the limits of pure reasoning, and argues that the next breakthrough lies in agentic thinking that integrates environment interaction, tool use, and robust reinforcement‑learning infrastructure.

AI InfrastructureModel Evaluationagentic thinking
0 likes · 18 min read
Junyang Lin’s 10k‑Word Review: From Reasoning to Agentic Thinking in Large Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 20, 2026 · Artificial Intelligence

Can AI Self‑Iterate? Inside MiniMax M2.7’s Self‑Improving Magic

The article examines MiniMax M2.7’s claim of self‑iteration, its impressive Kaggle record, and a series of technical tests—including code refactoring, real‑time chart generation, futures backtesting, business analysis, PPT creation, and news tracking—to evaluate the model’s practical AI self‑evolution capabilities.

AIAutoMLKaggle
0 likes · 8 min read
Can AI Self‑Iterate? Inside MiniMax M2.7’s Self‑Improving Magic
PaperAgent
PaperAgent
Mar 19, 2026 · Artificial Intelligence

How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models

The Scale‑SWE project releases a 100k‑task real software‑engineering dataset built with a sandboxed multi‑agent workflow, demonstrating that models fine‑tuned on this data achieve 64% on SWE‑bench‑Verified and surpass leading industrial baselines, highlighting the critical value of authentic SWE data.

AI AgentsModel EvaluationQwen3-30A3B-Instruct
0 likes · 7 min read
How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models
AI Engineering
AI Engineering
Mar 16, 2026 · Artificial Intelligence

Does Synthetic Data Have a Future? Evidence‑Based Conclusions

A detailed investigation of two public programming‑training datasets shows that AI‑only synthetic data suffers from severe quality issues, and even AI‑plus‑expert review yields only about ten percent usable examples, proving that high‑quality training data still requires domain experts and rigorous quality‑control processes.

AI trainingModel Evaluationdata labeling
0 likes · 16 min read
Does Synthetic Data Have a Future? Evidence‑Based Conclusions
Woodpecker Software Testing
Woodpecker Software Testing
Mar 15, 2026 · Artificial Intelligence

Why 95% of AI Models Fail: A Deep Dive into Model Evaluation Techniques

The article explains that a high‑accuracy model alone does not guarantee a deployable AI system; it details how inadequate evaluation leads to most production failures and presents a comprehensive, multi‑dimensional evaluation framework—including distributional robustness, fairness, explainability, temporal stability, and efficiency trade‑offs—plus practical CI/CD pipelines and common pitfalls.

AI quality assuranceFairness AuditModel Evaluation
0 likes · 7 min read
Why 95% of AI Models Fail: A Deep Dive into Model Evaluation Techniques
Woodpecker Software Testing
Woodpecker Software Testing
Mar 1, 2026 · Artificial Intelligence

Four Hidden Model Evaluation Pitfalls That Undermine AI Deployments

The article examines four common yet hidden model evaluation mistakes—confusing attractive metrics with business impact, using static test sets, ignoring statistical significance, and lacking fine‑grained attribution—illustrating each with real‑world cases and offering concrete practices to build a more robust, business‑aligned evaluation pipeline.

A/B testingAI deploymentModel Evaluation
0 likes · 8 min read
Four Hidden Model Evaluation Pitfalls That Undermine AI Deployments
Woodpecker Software Testing
Woodpecker Software Testing
Feb 27, 2026 · Artificial Intelligence

How Test Experts Can Accelerate Model Evaluation and Boost Performance

The article analyzes why over 73% of AI projects stall during model evaluation and presents three optimization paths—low‑latency pipelines, multidimensional bias diagnostics, and lightweight online probes—that together cut evaluation time by up to 13× and improve fault detection from hours to seconds.

AI testingModel EvaluationPerformance Optimization
0 likes · 6 min read
How Test Experts Can Accelerate Model Evaluation and Boost Performance
Data Party THU
Data Party THU
Feb 15, 2026 · Artificial Intelligence

Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing

FireRed-Image-Edit, the latest open‑source image‑editing model from the Xiaohongshu Super Intelligence team, outperforms existing benchmarks with superior instruction understanding, ID preservation and efficient architecture, thanks to its RedEdit Bench evaluation suite, a three‑stage training pipeline and a scalable data‑engine.

AI Image EditingFireRed-Image-EditModel Evaluation
0 likes · 8 min read
Why FireRed-Image-Edit Is the New Powerhouse in AI Image Editing
AI Cyberspace
AI Cyberspace
Jan 29, 2026 · Artificial Intelligence

Step‑by‑Step Guide to Efficient LLM Fine‑Tuning with LoRA, QLoRA, and Llama‑Factory

This tutorial explains the concepts, methods, and practical commands for fine‑tuning large language models using efficient techniques like LoRA and QLoRA, covering model selection, resource considerations, Docker deployment, dataset preparation, training configuration, evaluation metrics, model merging, and deployment with GGUF and Ollama.

GGUFGPU memory optimizationLLM fine-tuning
0 likes · 27 min read
Step‑by‑Step Guide to Efficient LLM Fine‑Tuning with LoRA, QLoRA, and Llama‑Factory
PaperAgent
PaperAgent
Jan 16, 2026 · Artificial Intelligence

Do Large Language Models Really Have Self‑Awareness? Inside Anthropic’s Introspective Experiments

This article reviews Anthropic’s recent paper on emergent introspective awareness in large language models, detailing a novel concept‑injection method, four key findings about AI’s ability to detect, distinguish, and control internal thoughts, and a cross‑model performance comparison.

AI IntrospectionAnthropicArtificial Intelligence Research
0 likes · 7 min read
Do Large Language Models Really Have Self‑Awareness? Inside Anthropic’s Introspective Experiments
Wuming AI
Wuming AI
Jan 6, 2026 · Artificial Intelligence

Top LLM Leaderboards Explained: How to Choose the Right Model

This article surveys the most popular large‑language‑model leaderboards—including lmarena, Artificial Analysis, SuperCLUE, and llm‑stats—detailing their evaluation methods, coverage areas, URLs, and practical usage tips, while warning readers that rankings are only a reference and real‑world performance may vary.

AI benchmarkingLLMModel Evaluation
0 likes · 5 min read
Top LLM Leaderboards Explained: How to Choose the Right Model
JavaGuide
JavaGuide
Dec 23, 2025 · Artificial Intelligence

Is GLM‑4.7 the Open‑Source Coding Model that Rivals Claude Sonnet 4.5?

The author integrates the newly released GLM‑4.7 model into Claude Code, runs three real‑world coding scenarios—including a React dashboard, a FastAPI authentication service, and a refined landing page—and finds that its stability, reasoning, and output quality closely match Claude Sonnet 4.5, positioning GLM‑4.7 as a strong open‑source alternative.

AI coding assistantClaude CodeCoding Plan
0 likes · 8 min read
Is GLM‑4.7 the Open‑Source Coding Model that Rivals Claude Sonnet 4.5?
Aikesheng Open Source Community
Aikesheng Open Source Community
Dec 4, 2025 · Artificial Intelligence

Gemini 3 Pro vs DeepSeek‑V3.2‑Exp: Which LLM Dominates SQL Understanding, Optimization, and Dialect Conversion?

This report evaluates the professional‑grade LLMs Gemini 3 Pro and DeepSeek‑V3.2‑Exp on three SQL‑related dimensions—understanding, optimization, and dialect conversion—using the SCALE benchmark, presenting detailed scores, strengths, weaknesses, and practical recommendations for database engineers and decision makers.

DeepSeekGeminiLLM
0 likes · 16 min read
Gemini 3 Pro vs DeepSeek‑V3.2‑Exp: Which LLM Dominates SQL Understanding, Optimization, and Dialect Conversion?
PaperAgent
PaperAgent
Dec 4, 2025 · Artificial Intelligence

From Code Foundations to AI Agents: A Deep Dive into Code LLMs and Their Applications

This article reviews a comprehensive 303‑page survey on code foundation models, tracing the evolution of code‑focused large language models from 2021 to 2025, comparing general‑purpose and specialized LLMs, and presenting extensive experiments on prompting, fine‑tuning, reinforcement learning, and autonomous coding agents.

AI CodingCode LLMModel Evaluation
0 likes · 5 min read
From Code Foundations to AI Agents: A Deep Dive into Code LLMs and Their Applications
Wuming AI
Wuming AI
Nov 19, 2025 · Artificial Intelligence

Gemini 3 Hands‑On Review: Multimodal Mastery Across Real‑World Cases

The author evaluates Google’s newly released Gemini 3 model through seven diverse cases—hand‑counting, macOS desktop simulation, a jump‑the‑gap game, lightweight Word, expert‑style explanations, SVG fan rendering, and video understanding—highlighting its multimodal reasoning, coding assistance, and remaining limitations.

AI coding assistanceGemini 3Model Evaluation
0 likes · 5 min read
Gemini 3 Hands‑On Review: Multimodal Mastery Across Real‑World Cases
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 19, 2025 · Artificial Intelligence

Building an AI-Powered Proofreading Agent for Media: Architecture, Prompt Engineering, and Evaluation

This article details a practical case study of designing, implementing, and evaluating an AI-driven proofreading agent for a media client, covering background challenges, a three‑layer architecture, prompt engineering techniques, RAG knowledge‑base construction, model selection, fine‑tuning, automated metrics, and lessons learned.

AIModel EvaluationProofreading
0 likes · 26 min read
Building an AI-Powered Proofreading Agent for Media: Architecture, Prompt Engineering, and Evaluation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 10, 2025 · Artificial Intelligence

How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation

This guide demonstrates an end‑to‑end workflow on Alibaba Cloud PAI that uses the Cosmos world model to replace Isaac simulation for robot action data augmentation, including minimal human demonstrations, prompt‑driven data expansion, rejection sampling, IDM inverse‑kinematics extraction, imitation‑learning fine‑tuning, and model evaluation.

AICosmosModel Evaluation
0 likes · 17 min read
How to Boost Robot Imitation Learning with Cosmos World Model Data Augmentation
Baidu Tech Salon
Baidu Tech Salon
Oct 10, 2025 · Artificial Intelligence

Navigating the 2025 AI Model Boom: Practical Evaluation Strategies

This article examines the rapid surge of large AI models in 2024‑2025, critiques the reliability of public leaderboards, and presents a business‑focused evaluation framework—including dataset construction, metric selection, automation, and LLM‑as‑judge techniques—to help developers choose the right model for real‑world applications.

AI PerformanceAI benchmarksLLM-as-judge
0 likes · 17 min read
Navigating the 2025 AI Model Boom: Practical Evaluation Strategies
IT Services Circle
IT Services Circle
Sep 28, 2025 · Artificial Intelligence

How to Build a Python AI Model for Predicting User Behavior

This article walks through the complete machine‑learning workflow for predicting user actions—covering core concepts, data collection, preprocessing, feature engineering, model training, evaluation, hyper‑parameter tuning, deployment, and future directions—using Python and popular AI libraries.

Model EvaluationPythonfeature engineering
0 likes · 11 min read
How to Build a Python AI Model for Predicting User Behavior
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2025 · Artificial Intelligence

Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies

This article examines the growing problem of hallucinations in large language models, outlining their causes across the model lifecycle, classifying four main hallucination types, and presenting both retrieval‑augmented generation and detection techniques—white‑box and black‑box—to reduce factual errors in critical applications.

AI SafetyLLMModel Evaluation
0 likes · 15 min read
Why Do Large Language Models Hallucinate? Causes, Types, and Mitigation Strategies
Baidu Geek Talk
Baidu Geek Talk
Sep 10, 2025 · Artificial Intelligence

How to Cut Through the LLM SOTA Hype: Practical Evaluation Strategies for 2025

Amid the 2025 surge of large language models, this article demystifies misleading SOTA claims, critiques benchmark reliability, and presents a comprehensive, business‑focused evaluation framework—including dataset construction, metric selection, automated scoring, and practical guidelines—to help developers and product teams choose the right model for real‑world applications.

AI benchmarkingLLM-as-judgeModel Evaluation
0 likes · 18 min read
How to Cut Through the LLM SOTA Hype: Practical Evaluation Strategies for 2025
Data Party THU
Data Party THU
Sep 10, 2025 · Industry Insights

What We Learned from Winning 3rd Place in China’s 2025 Big Data Challenge

The Dalian University team’s third‑place finish in the 2025 China University Computer Competition’s Big Data Challenge revealed key lessons about data cleaning, focused feature engineering, the power of simple robust models like Random Forest, custom evaluation metrics, and the indispensable role of tight teamwork in data science projects.

Data Science CompetitionModel Evaluationteam collaboration
0 likes · 6 min read
What We Learned from Winning 3rd Place in China’s 2025 Big Data Challenge
Data STUDIO
Data STUDIO
Sep 5, 2025 · Artificial Intelligence

19 Elegant Sklearn Tricks for More Efficient Machine Learning

This article presents 19 practical Sklearn functions—ranging from outlier detection to hyper‑parameter search—that replace manual data‑science steps, each illustrated with concise code examples and performance comparisons.

Model EvaluationPipelinedata preprocessing
0 likes · 24 min read
19 Elegant Sklearn Tricks for More Efficient Machine Learning
Architects' Tech Alliance
Architects' Tech Alliance
Aug 13, 2025 · Artificial Intelligence

Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges

DeepSeek, a fast‑rising large‑model contender, boasts impressive NLP and code‑generation capabilities, yet faces steep hurdles—including security concerns, industry‑specific customization gaps, slowing innovation, fierce competition from OpenAI, Google, and Alibaba’s Qwen3, and fragmented open‑source ecosystems—that cast doubt on its long‑term prospects.

AI competitionDeepSeekModel Evaluation
0 likes · 12 min read
Can DeepSeek Survive the AI Arms Race? A Deep Dive into Its Challenges
Data Party THU
Data Party THU
Aug 7, 2025 · Artificial Intelligence

How RLVER Boosts a 7B LLM to Match Top Commercial Models in Emotional Dialogue

The article analyzes RLVER, a reinforcement‑learning framework that integrates a user simulator as both environment and reward source, overcomes three major RL challenges, and elevates the Qwen2.5‑7B model’s Sentient‑Benchmark score from 13.3 to 79.2, rivaling GPT‑4o and Gemini 2.5 Pro.

Emotion ModelingModel EvaluationOpen-domain Dialogue
0 likes · 10 min read
How RLVER Boosts a 7B LLM to Match Top Commercial Models in Emotional Dialogue
Programmer DD
Programmer DD
Aug 6, 2025 · Artificial Intelligence

What Is GPT-OSS? Inside OpenAI’s New Open‑Source Large Language Models

OpenAI has unveiled GPT‑OSS, an open‑source large language model series featuring a 120‑billion‑parameter version for high‑throughput production and a 20‑billion‑parameter version for low‑latency consumer hardware, both using Mixture‑of‑Experts architecture, 4‑bit quantization, and released under the permissive Apache 2.0 license.

4-bit quantizationApache 2.0 licenseGPT-OSS
0 likes · 3 min read
What Is GPT-OSS? Inside OpenAI’s New Open‑Source Large Language Models
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jul 23, 2025 · Artificial Intelligence

How to Leverage TLM Platform for Comprehensive Large Model Evaluation

This guide explains how to use the TianJi Large Model (TLM) platform to create evaluation tasks, choose effectiveness or performance modes, work with built‑in datasets, interpret detailed reports, and understand the underlying metrics and judge‑model techniques for large‑model assessment.

AI metricsDatasetsModel Evaluation
0 likes · 9 min read
How to Leverage TLM Platform for Comprehensive Large Model Evaluation
DataFunTalk
DataFunTalk
Jul 18, 2025 · Artificial Intelligence

How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs

Alibaba International’s senior data science expert explains a systematic five‑strategy solution—data acquisition, augmentation, quality optimization, engineering pipeline, and evaluation loop—to overcome data scarcity, high annotation cost, and processing challenges for low‑resource languages in multilingual large language models.

AIModel Evaluationdata engineering
0 likes · 13 min read
How Alibaba Tackles Low-Resource Language Data for Multilingual LLMs
DaTaobao Tech
DaTaobao Tech
Jul 14, 2025 · Artificial Intelligence

Mastering AI Application Modes: Embedding, Copilot, and Agents Explained

This article explores practical AI engineering strategies, detailing the three AI application modes—Embedding, Copilot, and Agents—along with prompt engineering, model selection, function calling, RAG, workflow design, and multi‑agent architectures to boost business efficiency and user experience.

AIModel EvaluationPrompt Engineering
0 likes · 25 min read
Mastering AI Application Modes: Embedding, Copilot, and Agents Explained
AI Frontier Lectures
AI Frontier Lectures
Jul 10, 2025 · Artificial Intelligence

Can Dispersive Loss Supercharge Diffusion Models Without Extra Pre‑training?

Dispersive Loss is a plug‑and‑play regularization technique that enhances diffusion‑based generative models by encouraging dispersed internal representations, requiring no additional pre‑training, parameters, or data, and consistently improves performance across various model sizes and configurations, as demonstrated through extensive experiments.

Dispersive LossModel EvaluationRegularization
0 likes · 18 min read
Can Dispersive Loss Supercharge Diffusion Models Without Extra Pre‑training?
DataFunTalk
DataFunTalk
Jun 9, 2025 · Artificial Intelligence

Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test

The author conducts a transparent, objective assessment of several large language models on the 2025 Chinese national math exam, converting all questions to LaTeX, applying strict Gaokao scoring rules, and revealing each model's strengths and weaknesses across single‑choice, multiple‑choice, and fill‑in‑the‑blank items.

AI benchmarkingGaokaoModel Evaluation
0 likes · 7 min read
Can AI Models Pass the Chinese Math Gaokao? A Fair, Objective Test
JavaEdge
JavaEdge
Jun 6, 2025 · Artificial Intelligence

Why Qwen3 Embedding Models Are Setting New Benchmarks in Text Representation

The article introduces the Qwen3 Embedding series, detailing its model variants, architecture, training methodology, multilingual support, performance metrics across several benchmarks, and future development plans, highlighting its superior generalization and flexibility for diverse AI applications.

AIEmbeddingModel Evaluation
0 likes · 9 min read
Why Qwen3 Embedding Models Are Setting New Benchmarks in Text Representation
Fun with Large Models
Fun with Large Models
Jun 5, 2025 · Artificial Intelligence

EvalScope: The Ultimate Large‑Model Evaluation Framework You Control

This article introduces EvalScope, an open‑source framework for evaluating large language models, detailing its architecture, built‑in benchmarks, installation steps, and step‑by‑step guides for both performance stress testing and dataset‑based capability assessment, enabling users to independently verify model quality without relying on official documentation.

EvalScopeModel EvaluationPerformance Testing
0 likes · 12 min read
EvalScope: The Ultimate Large‑Model Evaluation Framework You Control
Baidu Tech Salon
Baidu Tech Salon
May 21, 2025 · Artificial Intelligence

Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities

At Baidu AI Day in Beijing, the company unveiled the Wenxin 4.5 Turbo and X1 Turbo models, detailing multimodal training breakthroughs, self‑feedback loops, enhanced reasoning and tool‑calling, while the China Academy of Information and Communications Technology awarded X1 Turbo the highest "4+" rating across 24 capability tests, highlighting its leading position in domestic large‑model performance.

BaiduModel EvaluationWenxin
0 likes · 9 min read
Baidu AI Day 2024: Wenxin X1 Turbo Sets New Benchmark with Top‑Level Evaluation and Advanced Multimodal Capabilities
AI Frontier Lectures
AI Frontier Lectures
May 12, 2025 · Artificial Intelligence

Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk

In a recent AI Ascent presentation, OpenAI researcher Dan Roberts explained how scaling laws for both pre‑training and reinforcement learning reveal a new test‑time dimension of model performance, showcased the capabilities of the o1 and o3 models, and outlined a massive compute‑scaling strategy aimed at creating AI systems that can reason for years like Einstein.

AIFuture PredictionsModel Evaluation
0 likes · 9 min read
Can Scaling Reinforcement Learning Turn AI Models into Real Thinkers? Insights from Dan Roberts' AI Ascent Talk
Mafengwo Technology
Mafengwo Technology
Apr 30, 2025 · Artificial Intelligence

How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy

The article details the development, training, and evaluation of MaFengWo's 32‑billion‑parameter travel large language model (mfw‑32B), highlighting its superior itinerary planning, personalized demand capture, budget management, and resource efficiency compared to DeepSeek‑R1, and describing the SFT and reinforcement‑learning stages that enabled these gains.

AI OptimizationLoRAModel Evaluation
0 likes · 14 min read
How MaFengWo’s mfw-32B Travel LLM Outperforms DeepSeek‑R1 in Speed and Accuracy
AI Large Model Application Practice
AI Large Model Application Practice
Mar 3, 2025 · Artificial Intelligence

Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?

This article examines how swapping in DeepSeek‑R1 enhances Retrieval‑Augmented Generation with deeper reasoning, outlines its benefits and pitfalls—including slower inference, higher compute costs, and hallucinations—provides a simple hallucination test, and proposes an Agentic RAG research assistant to balance accuracy and creativity.

AI reasoningAgenticDeepSeek
0 likes · 10 min read
Can DeepSeek‑R1 Unlock True “Deep Thinking” for Enterprise RAG?
AI Code to Success
AI Code to Success
Feb 25, 2025 · Artificial Intelligence

Master Logistic Regression: Theory, Practice, and Real‑World Tips

This comprehensive guide explains logistic regression fundamentals, the role of the Sigmoid function, loss and optimization methods, step‑by‑step Python implementation with data preparation, model training, evaluation, hyper‑parameter tuning, handling over‑ and under‑fitting, multi‑class extensions, and diverse application scenarios across medicine, finance, e‑commerce, and text analysis.

Model EvaluationPythonclassification
0 likes · 23 min read
Master Logistic Regression: Theory, Practice, and Real‑World Tips
AI Code to Success
AI Code to Success
Feb 24, 2025 · Artificial Intelligence

Master Linear Regression: Concepts, Math, and Python Implementation

This comprehensive guide explores linear regression from its fundamental concepts and mathematical foundations to practical Python implementation with scikit‑learn, covering single‑ and multiple‑variable models, assumptions, loss functions, OLS and gradient‑descent solutions, evaluation metrics, advantages, limitations, and real‑world case studies.

Model EvaluationPythongradient descent
0 likes · 21 min read
Master Linear Regression: Concepts, Math, and Python Implementation
Java Tech Enthusiast
Java Tech Enthusiast
Feb 22, 2025 · Artificial Intelligence

Grok‑3 Evaluation Controversy and Community Reactions

Three days after Grok‑3’s launch, OpenAI was accused of inflating its benchmark scores by using a “cons@64” method that aggregates 64 answers, a practice critics say unfairly skews comparisons with single‑shot models like o3‑mini, while developers have already begun experimenting with the model in simple games.

AIGrok-3Model Evaluation
0 likes · 5 min read
Grok‑3 Evaluation Controversy and Community Reactions
Architect
Architect
Feb 21, 2025 · Artificial Intelligence

DeepSeek Model Innovations: Architecture, Training Methods, and Performance Evaluation

This article reviews DeepSeek's recent breakthroughs, including the MLA attention redesign, GRPO alignment algorithm, MoE enhancements, multi‑stage training pipelines (SFT, RL, preference tuning, distillation), and comparative performance against GPT‑4o‑Mini and Llama 3.1, highlighting both strengths and remaining challenges.

DeepSeekMixture of ExpertsModel Evaluation
0 likes · 16 min read
DeepSeek Model Innovations: Architecture, Training Methods, and Performance Evaluation
AIWalker
AIWalker
Jan 17, 2025 · Artificial Intelligence

InternLM 3.0: Boosting Model Performance with Only 4 TB of Training Data

Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that refining data quality—measured as intelligence‑per‑token—can replace massive datasets, achieving higher reasoning and dialogue capabilities with just 4 TB of tokens, cutting training cost by over 75 % while approaching GPT‑4‑level performance.

AI researchInternLMModel Evaluation
0 likes · 9 min read
InternLM 3.0: Boosting Model Performance with Only 4 TB of Training Data
AIWalker
AIWalker
Jan 16, 2025 · Artificial Intelligence

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

InternLM 3.0 (InternLM‑3) upgrades the Shusheng‑PuYu model by refining data to boost "thinking density", using only 4 TB of tokens to surpass peer open‑source models, cutting training cost by over 75% while merging ordinary dialogue with deep reasoning capabilities.

InternLMModel Evaluationdata efficiency
0 likes · 9 min read
How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data
Model Perspective
Model Perspective
Dec 23, 2024 · Fundamentals

Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid

From the excitement of first encountering mathematical modeling to becoming a seasoned practitioner, this guide outlines five progressive stages, reveals typical misconceptions at each level, and offers practical advice to help learners avoid common traps and develop both technical and soft skills.

Data QualityModel Evaluationcommon pitfalls
0 likes · 8 min read
Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid
JavaEdge
JavaEdge
Dec 1, 2024 · Artificial Intelligence

Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model

QwQ‑32B‑Preview, an experimental AI model from the Qwen team, showcases strong reasoning in math and programming while facing challenges like language switching, inference loops, safety concerns, and variable capabilities across domains, with benchmark scores ranging from 50% to over 90% on tests such as GPQA, AIME, MATH‑500, and LiveCodeBench.

AI BenchmarkLLMModel Evaluation
0 likes · 7 min read
Exploring the Limits and Benchmarks of Qwen’s QwQ‑32B‑Preview AI Model
DataFunSummit
DataFunSummit
Nov 26, 2024 · Information Security

AI‑Driven Security Operations (AISECOPS): Architecture, Practices, and Evaluation

This article explains how large‑model AI can be integrated into security operations (AISECOPS) to simplify application integration, improve fault detection, and automate protection across complex north‑south and east‑west network layers, while addressing challenges such as data quality, cost control, model selection, and safety frameworks.

AISECOPSCost OptimizationEmbedding
0 likes · 22 min read
AI‑Driven Security Operations (AISECOPS): Architecture, Practices, and Evaluation
Model Perspective
Model Perspective
Nov 24, 2024 · Fundamentals

Mastering Baselines: How to Evaluate and Improve Your Mathematical Models

This article explains the concept of baselines in mathematical modeling, outlines how to construct various types such as empirical, random, theoretical, and heuristic baselines, and demonstrates their crucial role in model evaluation, resource allocation, and fostering innovation through practical case studies.

BaselineCase StudyModel Evaluation
0 likes · 7 min read
Mastering Baselines: How to Evaluate and Improve Your Mathematical Models
NewBeeNLP
NewBeeNLP
Nov 7, 2024 · Artificial Intelligence

Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies

This article provides a comprehensive analysis of large language model hallucinations, detailing their definitions, classifications, root causes, detection techniques, and a wide range of mitigation approaches—including RAG pipelines, decoding strategies, and model‑enhancement methods—to improve reliability and safety in real‑world AI applications.

AI SafetyModel EvaluationPrompt Engineering
0 likes · 22 min read
Tackling Large Model Hallucinations: Causes, Detection, and Mitigation Strategies
Architects' Tech Alliance
Architects' Tech Alliance
Nov 1, 2024 · Artificial Intelligence

Master Machine Learning: Core Concepts, Algorithms, and Evaluation Explained

This comprehensive guide walks through the fundamentals of artificial intelligence, machine learning and deep learning, explains the three essential elements of ML, outlines its historical milestones, details core techniques, workflow, key terminology, algorithm families, model evaluation metrics, bias‑variance trade‑offs, validation strategies, and practical model‑selection guidelines.

AlgorithmsModel Evaluationartificial intelligence
0 likes · 19 min read
Master Machine Learning: Core Concepts, Algorithms, and Evaluation Explained
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Artificial Intelligence

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

This article explains the core mechanisms of Transformer models, details the Rotational Position Embedding (RoPE) and FlashAttention techniques for handling long sequences, introduces the GLM-4-Plus series, and presents an empirical evaluation on the THUCNews dataset showing its superior long-text performance.

FlashAttentionGLM-4-PlusLong Text
0 likes · 13 min read
How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery
IT Services Circle
IT Services Circle
Sep 8, 2024 · Artificial Intelligence

10 Essential Plots for Linear Regression with Python Code Examples

This tutorial explains ten crucial visualizations for linear regression—scatter plot, trend line, residual plot, normal probability plot, learning curve, bias‑variance tradeoff, residuals vs fitted, partial regression, leverage, and Cook's distance—each illustrated with clear Python code using scikit‑learn, matplotlib, seaborn, and statsmodels.

Data visualizationMatplotlibModel Evaluation
0 likes · 21 min read
10 Essential Plots for Linear Regression with Python Code Examples
Java High-Performance Architecture
Java High-Performance Architecture
Aug 25, 2024 · Artificial Intelligence

Can AI Ace the Gaokao Math Test? Surprising Results from Six Top LLMs

A recent evaluation had six leading large‑language‑model products (GPT‑4o, GLM‑4, Wenxin 4.0, Doubao, Baichuan 4, and Qwen‑2.5) answer the first 14 objective questions of the new Gaokao mathematics I paper, revealing that only GLM‑4 surpassed the 60% passing threshold while the others performed far below expectations.

AIGLM-4Gaokao
0 likes · 7 min read
Can AI Ace the Gaokao Math Test? Surprising Results from Six Top LLMs
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 23, 2024 · Artificial Intelligence

Mastering Prompt Engineering: Advanced Techniques from Top AI Labs

This comprehensive guide examines cutting‑edge prompt‑engineering strategies—covering clear instruction design, role‑playing, separators, step‑by‑step workflows, external tools, systematic testing, and case studies from Anthropic, Google, and practical Img2Code applications—to help developers achieve more accurate and powerful interactions with large language models.

AI DevelopmentModel EvaluationPrompt Engineering
0 likes · 21 min read
Mastering Prompt Engineering: Advanced Techniques from Top AI Labs
DaTaobao Tech
DaTaobao Tech
Aug 21, 2024 · Artificial Intelligence

Mastering Custom Large‑Model Training: Data Strategies, LoRA Tricks, and Resource Planning

This article provides a comprehensive, step‑by‑step guide to training customized large language models, covering industry‑specific needs, data privacy, meticulous data cleaning, optimal data‑ratio balancing, token budgeting, GPU memory accounting, LoRA fine‑tuning techniques, and practical evaluation metrics for robust AI deployment.

AI trainingFine-tuningGPU Memory
0 likes · 23 min read
Mastering Custom Large‑Model Training: Data Strategies, LoRA Tricks, and Resource Planning
Model Perspective
Model Perspective
Aug 18, 2024 · Fundamentals

How to Judge a Mathematical Model: 6 Practical Criteria for Success

This article outlines six essential criteria—accuracy, robustness, simplicity, explainability, generalization, and scalability—for evaluating the quality of mathematical models such as e‑commerce recommendation systems, helping readers assess whether a model is truly reliable or merely a flashy façade.

Model EvaluationRecommendation SystemsRobustness
0 likes · 3 min read
How to Judge a Mathematical Model: 6 Practical Criteria for Success
Kuaishou Tech
Kuaishou Tech
Jul 31, 2024 · Artificial Intelligence

Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications

The article presents a comprehensive overview of Kuaishou’s Kolors (formerly 可图) multimodal generative model, detailing its data collection strategy, diffusion‑based architecture, evaluation metrics, derived capabilities such as prompt refinement and interactive generation, and a range of practical applications from AI‑powered live‑stream gifts to virtual try‑on, while also offering strategic advice for the domestic visual‑generation community.

AI applicationsKolorsModel Evaluation
0 likes · 27 min read
Kuaishou’s Kolors Text‑to‑Image Model: Architecture, Evaluation, and Real‑World Applications
Architect
Architect
Jul 19, 2024 · Artificial Intelligence

Can Machine Learning Beat the Odds? A Deep Dive into Football Match Prediction

This article presents a data‑driven football match prediction system that extracts match features, builds machine‑learning models—including linear, SVM, random forest, and deep neural networks—and evaluates their accuracy on European league data, then analyzes betting strategies, limitations, and extensions to stock forecasting.

Model Evaluationartificial intelligencedata mining
0 likes · 24 min read
Can Machine Learning Beat the Odds? A Deep Dive into Football Match Prediction
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2024 · Artificial Intelligence

Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event

On June 27, 2024, Xiaohongshu’s technical team will livestream a two‑hour session across WeChat Channels, Bilibili, Douyin and Xiaohongshu, showcasing six top‑conference papers on large‑model advances—including early‑stopping and fine‑grained self‑consistency, novel evaluation methods, negative‑sample‑assisted distillation, and LLM‑based note recommendation—followed by a Q&A and recruitment briefing.

AI researchModel EvaluationRecommendation Systems
0 likes · 12 min read
Xiaohongshu 2024 Large Model Frontier Paper Sharing Live Event
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 5, 2024 · Artificial Intelligence

Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage

This article reviews the open‑source 9‑billion‑parameter GLM‑4‑9B model, covering installation, quick‑start inference code, quirky Chinese riddles that highlight its strengths over GPT‑4, extensive benchmark tables for dialogue, multilingual, tool‑calling and multimodal tasks, and its broader impact on the Chinese AI ecosystem.

AIGLM-4-9BModel Evaluation
0 likes · 14 min read
Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage
NewBeeNLP
NewBeeNLP
May 16, 2024 · Artificial Intelligence

How Large Language Models Transform Advertising Copy Generation

This article examines the adoption of large language models for intelligent advertising copy creation, detailing business challenges, model selection criteria, training data preparation, fine‑tuning methods, performance evaluation, deployment results, while highlighting the trade‑offs between model size, cost, and output quality.

AI marketingFine-tuningModel Evaluation
0 likes · 20 min read
How Large Language Models Transform Advertising Copy Generation
NewBeeNLP
NewBeeNLP
May 15, 2024 · Artificial Intelligence

How Large Language Models and Knowledge Graphs Can Boost Each Other

This talk reviews recent advances in large language models, compares them with knowledge graphs, explores how LLMs enhance knowledge extraction and completion, examines how knowledge graphs aid LLM evaluation and safe deployment, and outlines future interactive integration between the two technologies.

AI researchKnowledge GraphsModel Evaluation
0 likes · 13 min read
How Large Language Models and Knowledge Graphs Can Boost Each Other
IT Services Circle
IT Services Circle
May 1, 2024 · Artificial Intelligence

Summary of Andrew Ng’s AI Agent Talk: Models, Workflows, and Design Patterns

The article summarizes Andrew Ng’s presentation on AI agents, contrasting traditional single‑prompt large‑model usage with iterative agent‑based workflows, reporting experimental accuracy gains, and outlining four agent design patterns—reflection, tool use, planning, and multi‑agent collaboration—while discussing practical trade‑offs such as latency and token speed.

AI AgentDesign PatternsModel Evaluation
0 likes · 7 min read
Summary of Andrew Ng’s AI Agent Talk: Models, Workflows, and Design Patterns
DataFunTalk
DataFunTalk
Apr 21, 2024 · Artificial Intelligence

Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking

This article presents a comprehensive guide on constructing domain-specific large language models, covering the differences from general models, how to build high‑quality domain datasets, selecting appropriate training methods, designing validation sets, evaluating model capabilities, and benchmarking domestic hardware performance.

AIModel Evaluationdataset construction
0 likes · 20 min read
Guidelines for Building Domain-Specific Large Models: Dataset Construction, Training Methods, Evaluation, and Hardware Benchmarking
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 23, 2024 · Artificial Intelligence

Google’s Open‑Source Gemma Large Language Model: Architecture, Performance, and Community Reception

Google has released the open‑source Gemma LLM series (2B and 7B parameters) built on Gemini‑style architecture, offering free, commercial‑ready models that run on notebooks, support JAX/PyTorch/TensorFlow, outperform many open‑source peers, and have quickly sparked extensive community testing and discussion.

GemmaGoogleJAX
0 likes · 5 min read
Google’s Open‑Source Gemma Large Language Model: Architecture, Performance, and Community Reception
DataFunTalk
DataFunTalk
Feb 10, 2024 · Artificial Intelligence

Mitigating Hallucinations in Large Language Model Applications with Knowledge Graphs

This article examines the challenges of using large language models for industry Q&A, defines hallucination phenomena, evaluates their causes and impact, and proposes a set of strategies—including high‑quality fine‑tuning data, honest alignment, advanced decoding, and external knowledge‑graph augmentation—to reduce hallucinations and improve answer reliability.

Knowledge GraphModel Evaluationhallucination
0 likes · 21 min read
Mitigating Hallucinations in Large Language Model Applications with Knowledge Graphs
Baidu Geek Talk
Baidu Geek Talk
Jan 15, 2024 · Artificial Intelligence

Qianfan Large Model Platform: Making Large Models Accessible - Baidu's Latest Work on Model Fine-tuning and Deployment

Baidu’s Qianfan Large Model Platform provides a one‑stop enterprise solution with 54 pre‑installed models, advanced fine‑tuning, comprehensive evaluation metrics, and optimized deployment that cuts costs up to 90% and boosts throughput 3‑5×, enabling rapid, affordable AI application development.

AI-native applicationsBaidu QianfanCost Optimization
0 likes · 12 min read
Qianfan Large Model Platform: Making Large Models Accessible - Baidu's Latest Work on Model Fine-tuning and Deployment
政采云技术
政采云技术
Oct 10, 2023 · Artificial Intelligence

Predicting Membership Purchase with Logistic Regression: Feature Engineering, Model Training, Evaluation, and Deployment

This article presents a complete workflow for predicting whether users will purchase a membership using logistic regression, covering data collection, feature selection, handling imbalanced samples, model training, hyper‑parameter tuning, threshold optimization, evaluation metrics such as accuracy, precision, recall, AUC, lift, and finally deployment on a big‑data platform with PySpark.

Big DataModel Evaluationfeature engineering
0 likes · 17 min read
Predicting Membership Purchase with Logistic Regression: Feature Engineering, Model Training, Evaluation, and Deployment
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Sep 22, 2023 · Artificial Intelligence

Understanding Large Language Models and Prompt Engineering: A Practical Guide

This article provides an introductory overview of large language models (LLMs), compares popular models, explains their underlying principles, and offers practical guidance on prompt engineering, model evaluation, usage tips, and safety considerations, helping readers effectively select and apply LLMs in various scenarios.

AILLMModel Evaluation
0 likes · 44 min read
Understanding Large Language Models and Prompt Engineering: A Practical Guide
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 18, 2023 · Artificial Intelligence

Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls

This article shares practical techniques for domain‑specific large model continue pre‑training, including data selection, mixing ratios with general data, multi‑task instruction pre‑training, resource‑aware fine‑tuning strategies, evaluation set design, vocabulary considerations, and deployment constraints for 7‑13B models.

AI researchModel EvaluationSFT
0 likes · 9 min read
Unlocking Domain-Specific Large Model Training: Proven Tricks and Pitfalls
21CTO
21CTO
Jul 23, 2023 · Artificial Intelligence

What Nathan Lambert Reveals About Meta’s Llama 2: Key Insights and Technical Deep‑Dive

This article translates and analyzes Nathan Lambert’s commentary on Meta’s Llama 2 paper, detailing the model’s architecture, training data, RLHF pipeline, reward models, evaluation methods, safety improvements, licensing terms, and the broader implications for open‑source large language models.

Llama-2Meta AIModel Evaluation
0 likes · 22 min read
What Nathan Lambert Reveals About Meta’s Llama 2: Key Insights and Technical Deep‑Dive
DataFunSummit
DataFunSummit
May 31, 2023 · Artificial Intelligence

Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work

This article reviews the evolution of face detection, covering the Widely‑Face dataset, major research directions such as feature fusion, label assignment, auxiliary supervision, anchor‑free methods, NAS‑based designs, summarizes key papers from S3FD to MogFace, introduces ModelScope implementations, and outlines future challenges and opportunities.

AI researchComputer VisionDatasets
0 likes · 13 min read
Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work
GuanYuan Data Tech Team
GuanYuan Data Tech Team
May 25, 2023 · Artificial Intelligence

How to Build a Comprehensive ML Model Quality Assessment Framework

This article explains why and how to evaluate machine learning model quality through a structured framework that covers data validation, feature checks, and algorithm testing, helping ensure accuracy, reliability, and maintainability before deployment.

AI GovernanceModel Evaluationdata validation
0 likes · 19 min read
How to Build a Comprehensive ML Model Quality Assessment Framework
Full-Stack Trendsetter
Full-Stack Trendsetter
May 18, 2023 · Artificial Intelligence

How 360 and ChatGLM Are Building China’s “Microsoft + OpenAI” Large‑Model Duo

On May 16, 360 and Zhipu AI announced a strategic partnership to co‑develop the trillion‑parameter models 360GLM and 360GPT, positioning them as China’s answer to Microsoft‑OpenAI by combining large‑scale pre‑training, bilingual capabilities, and integration with 360’s search and browser ecosystem.

360AI CollaborationChatGLM
0 likes · 7 min read
How 360 and ChatGLM Are Building China’s “Microsoft + OpenAI” Large‑Model Duo
DataFunTalk
DataFunTalk
Feb 21, 2023 · Artificial Intelligence

Analysis of Large Language Models: Capabilities, Training Methods, and Limitations – Summary of Prof. Qiu Xipeng’s Lecture

Prof. Qiu Xipeng’s lecture provides a comprehensive overview of large language models—from their historical development and architectural foundations to key technologies such as in‑context learning, chain‑of‑thought, and natural‑instruction learning, as well as RLHF training, capability evaluation, and current limitations of ChatGPT.

ChatGPTIn-Context LearningModel Evaluation
0 likes · 15 min read
Analysis of Large Language Models: Capabilities, Training Methods, and Limitations – Summary of Prof. Qiu Xipeng’s Lecture
Python Programming Learning Circle
Python Programming Learning Circle
Dec 7, 2022 · Artificial Intelligence

Predicting the 2022 FIFA World Cup Champion Using Machine Learning Models

This article details a data‑mining project that uses historical World Cup match data, extensive feature engineering, and various machine‑learning algorithms—including neural networks, logistic regression, SVM, decision trees, and random forests—to predict the champion of the 2022 tournament, while analyzing model errors and proposing improvements.

Model EvaluationWorld Cupclassification
0 likes · 7 min read
Predicting the 2022 FIFA World Cup Champion Using Machine Learning Models
Laiye Technology Team
Laiye Technology Team
Nov 23, 2022 · Artificial Intelligence

Design and Practices of a Data‑Driven OCR Testing System

The article describes Laiye's shift to a data‑driven deep‑learning workflow and presents the design, macro‑ and micro‑analysis features, visual diff tools, distributed tracing, and code examples of their OCR testing system that accelerate model evaluation and iterative optimization.

AIData‑DrivenMLOps
0 likes · 11 min read
Design and Practices of a Data‑Driven OCR Testing System