Tagged articles
23 articles
Page 1 of 1
PaperAgent
PaperAgent
Apr 16, 2026 · Artificial Intelligence

Do LLMs Learn Hidden Preferences? Inside the Subliminal Learning Phenomenon

A recent Nature paper by Anthropic reveals that large language models can covertly transmit preferences and misaligned behaviors through unrelated data, demonstrating a "subliminal learning" effect that spans numbers, code, and chain‑of‑thought tasks and is driven by shared model initialization.

AnthropicLLMModel Alignment
0 likes · 10 min read
Do LLMs Learn Hidden Preferences? Inside the Subliminal Learning Phenomenon
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Mar 28, 2026 · Artificial Intelligence

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

This guide breaks down the four major large‑model training paradigms—pre‑training, supervised fine‑tuning, preference alignment, and RLHF—explaining which parameters are updated, how attention is reshaped, and what capabilities are gained, so you can deliver a structured, interview‑ready answer.

AI InterviewFine-tuningLLM
0 likes · 8 min read
How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Feb 11, 2026 · Artificial Intelligence

Can TI‑DPO Fix DPO’s Blind Spot? Token‑Importance Guided Direct Preference Optimization for Better LLM Alignment

TI‑DPO introduces a hybrid weighting scheme and a triplet‑loss objective that weight tokens by gradient attribution and a Gaussian prior, enabling precise identification of critical tokens and yielding consistent performance gains over DPO, SimPO, and GRPO on Llama‑3, Mistral‑7B, and downstream benchmarks such as IFEval, TruthfulQA, and HumanEval.

Direct Preference OptimizationLarge Language ModelsModel Alignment
0 likes · 8 min read
Can TI‑DPO Fix DPO’s Blind Spot? Token‑Importance Guided Direct Preference Optimization for Better LLM Alignment
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Dec 12, 2025 · Artificial Intelligence

Why Fixing Bad Cases Beats Adding More Data in RLHF

In industrial RLHF, repairing bad cases—structural error samples—provides explicit alignment signals that improve model capability far more efficiently than simply increasing data volume, because it teaches the model how to correct mistakes rather than just exposing it to more examples.

Capability ImprovementModel AlignmentRLHF
0 likes · 9 min read
Why Fixing Bad Cases Beats Adding More Data in RLHF
Data Party THU
Data Party THU
Dec 6, 2025 · Artificial Intelligence

Why Adding Toxic Data Can Make Language Models Safer and More Capable

A recent study shows that deliberately mixing a moderate amount of toxic content into large‑language‑model pre‑training actually sharpens the model’s internal representation of toxicity, enabling post‑training interventions to more effectively detoxify the model while preserving or even improving its general capabilities.

LLMModel AlignmentToxic Data
0 likes · 10 min read
Why Adding Toxic Data Can Make Language Models Safer and More Capable
Alimama Tech
Alimama Tech
Dec 3, 2025 · Artificial Intelligence

How LORE Transforms E‑Commerce Search Relevance with Generative AI

The article details the development and deployment of LORE, a large generative model that reshapes e‑commerce search relevance by combining knowledge injection, chain‑of‑thought reasoning, and multimodal alignment, achieving simultaneous improvements in user experience and revenue metrics.

Chain-of-ThoughtModel AlignmentMultimodal
0 likes · 15 min read
How LORE Transforms E‑Commerce Search Relevance with Generative AI
Qunar Tech Salon
Qunar Tech Salon
Oct 10, 2025 · Artificial Intelligence

Master Prompt Engineering: Proven Strategies to Optimize AI Model Performance

This article presents practical, step‑by‑step techniques for refining prompts used in large language model applications—covering intent detection, context enrichment, instruction compliance, model capability activation, and structural design—to dramatically improve accuracy, reduce hallucinations, and boost overall AI system reliability.

AI OptimizationChatbot DesignModel Alignment
0 likes · 27 min read
Master Prompt Engineering: Proven Strategies to Optimize AI Model Performance
DataFunSummit
DataFunSummit
Sep 24, 2025 · Artificial Intelligence

Taming LLM Hallucinations: Strategies and Solutions from 360

This article explores the problem of large‑model hallucinations, explains its definitions and classifications, analyzes root causes in data, algorithms and inference, and presents detection methods and practical mitigation techniques such as RAG, decoding strategies, and model‑enhancement approaches, illustrated with real‑world 360 use cases and future research directions.

AI SafetyLLMModel Alignment
0 likes · 22 min read
Taming LLM Hallucinations: Strategies and Solutions from 360
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 9, 2025 · Artificial Intelligence

Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach

The article analyzes OpenAI's study on language‑model hallucinations, explaining how statistical limits in pre‑training and flawed binary evaluation incentives cause false answers, and proposes a confidence‑threshold scoring system that rewards honest "I don’t know" responses to improve reliability.

AI SafetyModel Alignmentconfidence threshold
0 likes · 8 min read
Why Do Language Models Hallucinate? Roots, Risks, and a New Evaluation Approach
DataFunTalk
DataFunTalk
Jun 19, 2025 · Artificial Intelligence

Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find

OpenAI’s new research reveals that training language models to produce incorrect answers in a single domain can trigger a toxic persona feature, causing the model to generate harmful suggestions across unrelated tasks, but the team also demonstrates detection methods and a reversible “emergent realignment” technique to restore safe behavior.

AI SafetyEmergent misalignmentModel Alignment
0 likes · 7 min read
Can We Flip the Switch on AI Good vs. Evil? OpenAI’s Toxic Persona Find
AI Frontier Lectures
AI Frontier Lectures
Mar 7, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, ChatGPT, multimodal GPT‑4 variants, open‑weight releases, and the cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, training paradigms, alignment techniques, and industry impact.

Artificial IntelligenceCost‑Efficient InferenceModel Alignment
0 likes · 27 min read
From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)
DataFunSummit
DataFunSummit
Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

Distributed TrainingGPU OptimizationLLM
0 likes · 24 min read
NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF
DataFunSummit
DataFunSummit
Aug 8, 2024 · Artificial Intelligence

Exploring Training and Alignment Techniques for Financial Large Models

The announcement details a DataFun Summit 2024 session where Du Xiaoman AI researcher Huo Liangyu will present on the challenges, development, and alignment methods of the Xuan Yuan financial large language model, highlighting RLHF techniques, data collection, and real‑world deployment insights for the finance sector.

AIFinancial AILarge Language Models
0 likes · 6 min read
Exploring Training and Alignment Techniques for Financial Large Models
Data Thinking Notes
Data Thinking Notes
Aug 1, 2024 · Artificial Intelligence

Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies

Over the past year our team explored applying large language models to specialized domains, detailing their professional benefits, unique challenges such as accuracy and knowledge‑base maintenance, and presenting solutions like alignment enhancement via BPO, Text2API, RAG, and advanced SFT/DPO techniques.

Large Language ModelsModel AlignmentRAG
0 likes · 10 min read
Unlocking Vertical Domain LLMs: Advantages, Challenges, and Alignment Strategies
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2024 · Artificial Intelligence

How Alibaba’s Logistics AI Overcame B2B Large Model Challenges

Alibaba’s logistics AI team shares their year‑long journey building a vertical‑domain large language model for logistics, detailing model alignment, Text2API, RAG, SFT techniques, challenges like accuracy and knowledge‑base maintenance, and showcasing real‑world applications such as chatbots, DingTalk assistants, and custom AI assistants.

Model AlignmentRAGSFT
0 likes · 16 min read
How Alibaba’s Logistics AI Overcame B2B Large Model Challenges
NewBeeNLP
NewBeeNLP
Jun 24, 2024 · Artificial Intelligence

How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions

This article reviews Fudan University's Knowledge Factory Lab research on domain large models, covering background, three major deployment challenges, data‑selection strategies, ability‑enhancement techniques, collaborative workflows, and retrieval‑augmented generation methods that aim to make large models practical for real‑world tasks.

Large Language ModelsModel Alignmentdomain adaptation
0 likes · 18 min read
How Domain Large Models Are Shaping the Future of AI: Challenges and Solutions
DataFunTalk
DataFunTalk
Mar 10, 2024 · Artificial Intelligence

Aligning Graph Models with Large Language Models for Open-Task Scenarios

This talk presents GraphTranslator, a framework that bridges pretrained graph models and large language models to enable unified handling of both predefined and open-ended graph analysis tasks by translating node representations into language tokens and training an alignment producer for node‑text pairs.

AI researchLarge Language ModelsModel Alignment
0 likes · 3 min read
Aligning Graph Models with Large Language Models for Open-Task Scenarios
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 6, 2023 · Artificial Intelligence

How to Systematically Fix Bad Cases in Large Language Models

The article outlines a structured approach to identifying, categorizing, evaluating impact, and repairing undesirable responses from large language models, covering both model‑level interventions across training stages and practical inference‑time techniques such as parameter tuning, prompt engineering, RAG, and pre/post‑processing safeguards.

Model AlignmentPrompt engineeringRAG
0 likes · 9 min read
How to Systematically Fix Bad Cases in Large Language Models
DataFunTalk
DataFunTalk
Sep 8, 2023 · Artificial Intelligence

Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges

This article examines how large language models and knowledge graphs complement each other, discussing their respective strengths, integration techniques such as prompt engineering and knowledge editing, and outlining future research directions for building large knowledge models that combine linguistic understanding with structured knowledge representation.

AIKnowledge GraphsLarge Language Models
0 likes · 27 min read
Knowledge Processing in the Era of Large Models: New Opportunities and New Challenges
dbaplus Community
dbaplus Community
Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTLarge Language ModelsModel Alignment
0 likes · 17 min read
Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency
Open Source Linux
Open Source Linux
Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTLarge Language Models
0 likes · 15 min read
How Does ChatGPT Work? Inside RLHF and Model Consistency
Top Architect
Top Architect
Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTLarge Language ModelsModel Alignment
0 likes · 15 min read
How ChatGPT Works: Training, RLHF, and Consistency Issues
Top Architect
Top Architect
Feb 8, 2023 · Artificial Intelligence

A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities

This article analyses how ChatGPT and the GPT‑3.5 series evolved from the original GPT‑3 through large‑scale pre‑training, code‑based training, instruction tuning, and reinforcement learning from human feedback, identifying the origins of their language generation, in‑context learning, world knowledge, code understanding, chain‑of‑thought reasoning, and alignment capabilities while also outlining current limitations.

ChatGPTGPT-3.5Instruction Tuning
0 likes · 27 min read
A Technical Roadmap of GPT‑3.5: From Pre‑training to RLHF and Emerging Capabilities