Tagged articles

23 articles

Page 1 of 1

Apr 16, 2026 · Artificial Intelligence

Do LLMs Learn Hidden Preferences? Inside the Subliminal Learning Phenomenon

A recent Nature paper by Anthropic reveals that large language models can covertly transmit preferences and misaligned behaviors through unrelated data, demonstrating a "subliminal learning" effect that spans numbers, code, and chain‑of‑thought tasks and is driven by shared model initialization.

AnthropicLLMModel Alignment

0 likes · 10 min read

Do LLMs Learn Hidden Preferences? Inside the Subliminal Learning Phenomenon

AI Large-Model Wave and Transformation Guide

Mar 28, 2026 · Artificial Intelligence

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

This guide breaks down the four major large‑model training paradigms—pre‑training, supervised fine‑tuning, preference alignment, and RLHF—explaining which parameters are updated, how attention is reshaped, and what capabilities are gained, so you can deliver a structured, interview‑ready answer.

AI InterviewFine-tuningLLM

0 likes · 8 min read

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

Machine Learning Algorithms & Natural Language Processing

Feb 11, 2026 · Artificial Intelligence

Can TI‑DPO Fix DPO’s Blind Spot? Token‑Importance Guided Direct Preference Optimization for Better LLM Alignment

TI‑DPO introduces a hybrid weighting scheme and a triplet‑loss objective that weight tokens by gradient attribution and a Gaussian prior, enabling precise identification of critical tokens and yielding consistent performance gains over DPO, SimPO, and GRPO on Llama‑3, Mistral‑7B, and downstream benchmarks such as IFEval, TruthfulQA, and HumanEval.

Direct Preference OptimizationLarge Language ModelsModel Alignment

0 likes · 8 min read

Can TI‑DPO Fix DPO’s Blind Spot? Token‑Importance Guided Direct Preference Optimization for Better LLM Alignment

Wu Shixiong's Large Model Academy

Dec 12, 2025 · Artificial Intelligence

Why Fixing Bad Cases Beats Adding More Data in RLHF

In industrial RLHF, repairing bad cases—structural error samples—provides explicit alignment signals that improve model capability far more efficiently than simply increasing data volume, because it teaches the model how to correct mistakes rather than just exposing it to more examples.

Capability ImprovementModel AlignmentRLHF

0 likes · 9 min read

Why Fixing Bad Cases Beats Adding More Data in RLHF

Data Party THU

Dec 6, 2025 · Artificial Intelligence

Why Adding Toxic Data Can Make Language Models Safer and More Capable

A recent study shows that deliberately mixing a moderate amount of toxic content into large‑language‑model pre‑training actually sharpens the model’s internal representation of toxicity, enabling post‑training interventions to more effectively detoxify the model while preserving or even improving its general capabilities.

LLMModel AlignmentToxic Data

0 likes · 10 min read

Why Adding Toxic Data Can Make Language Models Safer and More Capable

Alimama Tech

Dec 3, 2025 · Artificial Intelligence

How LORE Transforms E‑Commerce Search Relevance with Generative AI

The article details the development and deployment of LORE, a large generative model that reshapes e‑commerce search relevance by combining knowledge injection, chain‑of‑thought reasoning, and multimodal alignment, achieving simultaneous improvements in user experience and revenue metrics.

Chain-of-ThoughtModel AlignmentMultimodal

0 likes · 15 min read

How LORE Transforms E‑Commerce Search Relevance with Generative AI

Qunar Tech Salon

Oct 10, 2025 · Artificial Intelligence

Master Prompt Engineering: Proven Strategies to Optimize AI Model Performance

This article presents practical, step‑by‑step techniques for refining prompts used in large language model applications—covering intent detection, context enrichment, instruction compliance, model capability activation, and structural design—to dramatically improve accuracy, reduce hallucinations, and boost overall AI system reliability.

AI OptimizationChatbot DesignModel Alignment

0 likes · 27 min read

Master Prompt Engineering: Proven Strategies to Optimize AI Model Performance

DataFunSummit

Sep 24, 2025 · Artificial Intelligence

Taming LLM Hallucinations: Strategies and Solutions from 360

This article explores the problem of large‑model hallucinations, explains its definitions and classifications, analyzes root causes in data, algorithms and inference, and presents detection methods and practical mitigation techniques such as RAG, decoding strategies, and model‑enhancement approaches, illustrated with real‑world 360 use cases and future research directions.

AI SafetyLLMModel Alignment

0 likes · 22 min read

Taming LLM Hallucinations: Strategies and Solutions from 360

Baobao Algorithm Notes

Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI SafetyModel Alignmentconfidence threshold

0 likes · 8 min read

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

DataFunTalk

Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI SafetyEmergent misalignmentModel Alignment

0 likes · 7 min read

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

AI Frontier Lectures

Mar 7, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, ChatGPT, multimodal GPT‑4 variants, open‑weight releases, and the cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, training paradigms, alignment techniques, and industry impact.

Artificial IntelligenceCost‑Efficient InferenceModel Alignment

0 likes · 27 min read

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

DataFunSummit

Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

Distributed TrainingGPU OptimizationLLM

0 likes · 24 min read

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

DataFunSummit

Aug 8, 2024 · Artificial Intelligence

Exploring Training and Alignment Techniques for Financial Large Models

The announcement details a DataFun Summit 2024 session where Du Xiaoman AI researcher Huo Liangyu will present on the challenges, development, and alignment methods of the Xuan Yuan financial large language model, highlighting RLHF techniques, data collection, and real‑world deployment insights for the finance sector.

AIFinancial AILarge Language Models

0 likes · 6 min read

Exploring Training and Alignment Techniques for Financial Large Models

Data Thinking Notes

Aug 1, 2024 · Artificial Intelligence

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Over the past year our team explored applying large language models to specialized domains, detailing their professional benefits, unique challenges such as accuracy and knowledge‑base maintenance, and presenting solutions like alignment enhancement via BPO, Text2API, RAG, and advanced SFT/DPO techniques.

Large Language ModelsModel AlignmentRAG

0 likes · 10 min read

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Alibaba Cloud Developer

Jul 22, 2024 · Artificial Intelligence

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

Alibaba’s logistics AI team shares their year‑long journey building a vertical‑domain large language model for logistics, detailing model alignment, Text2API, RAG, SFT techniques, challenges like accuracy and knowledge‑base maintenance, and showcasing real‑world applications such as chatbots, DingTalk assistants, and custom AI assistants.

Model AlignmentRAGSFT

0 likes · 16 min read

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

NewBeeNLP

Jun 24, 2024 · Artificial Intelligence

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

This article reviews Fudan University's Knowledge Factory Lab research on domain large models, covering background, three major deployment challenges, data‑selection strategies, ability‑enhancement techniques, collaborative workflows, and retrieval‑augmented generation methods that aim to make large models practical for real‑world tasks.

Large Language ModelsModel Alignmentdomain adaptation

0 likes · 18 min read

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

DataFunTalk

Mar 10, 2024 · Artificial Intelligence

Aligning Graph Models with Large Language Models for Open-Task Scenarios

This talk presents GraphTranslator, a framework that bridges pretrained graph models and large language models to enable unified handling of both predefined and open-ended graph analysis tasks by translating node representations into language tokens and training an alignment producer for node‑text pairs.

AI researchLarge Language ModelsModel Alignment

0 likes · 3 min read

Aligning Graph Models with Large Language Models for Open-Task Scenarios

Baobao Algorithm Notes

Dec 6, 2023 · Artificial Intelligence

How to Systematically Fix Bad Cases in Large Language Models

The article outlines a structured approach to identifying, categorizing, evaluating impact, and repairing undesirable responses from large language models, covering both model‑level interventions across training stages and practical inference‑time techniques such as parameter tuning, prompt engineering, RAG, and pre/post‑processing safeguards.

Model AlignmentPrompt engineeringRAG

0 likes · 9 min read

How to Systematically Fix Bad Cases in Large Language Models

DataFunTalk

Sep 8, 2023 · Artificial Intelligence

Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges

This article examines how large language models and knowledge graphs complement each other, discussing their respective strengths, integration techniques such as prompt engineering and knowledge editing, and outlining future research directions for building large knowledge models that combine linguistic understanding with structured knowledge representation.

AIKnowledge GraphsLarge Language Models

0 likes · 27 min read

Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges

dbaplus Community

Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTLarge Language ModelsModel Alignment

0 likes · 17 min read

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

Open Source Linux

Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTLarge Language Models

0 likes · 15 min read

How Does ChatGPT Work? Inside RLHF and Model Consistency

Top Architect

Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTLarge Language ModelsModel Alignment

0 likes · 15 min read

How ChatGPT Works: Training, RLHF, and Consistency Issues

Top Architect

Feb 8, 2023 · Artificial Intelligence

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

This article analyses how ChatGPT and the GPT‑3.5 series evolved from the original GPT‑3 through large‑scale pre‑training, code‑based training, instruction tuning, and reinforcement learning from human feedback, identifying the origins of their language generation, in‑context learning, world knowledge, code understanding, chain‑of‑thought reasoning, and alignment capabilities while also outlining current limitations.

ChatGPTGPT-3.5Instruction Tuning

0 likes · 27 min read

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities