Tagged articles

reinforcement learning

743 articles · Page 6 of 8

Dec 7, 2024 · Artificial Intelligence

How Reinforcement Fine-Tuning (RFT) Is Redefining AI Customization

Reinforcement Fine-Tuning (RFT), unveiled at OpenAI’s 12‑day launch, introduces a feedback‑loop approach that transforms generic models into specialized experts using reinforcement learning, small data, and domain‑specific scorers, offering product managers a powerful tool for rapid, cost‑effective AI customization across industries.

AI customizationfine-tuningmachine learning

0 likes · 7 min read

How Reinforcement Fine-Tuning (RFT) Is Redefining AI Customization

Bilibili Tech

Dec 6, 2024 · Artificial Intelligence

Ensemble-based Offline-to-Online Reinforcement Learning (ENOTO): Methodology, Experiments, and Analysis

ENOTO introduces ensemble Q‑networks into the offline‑to‑online reinforcement‑learning pipeline, using minimum‑Q and uncertainty‑driven exploration to stabilize fine‑tuning, boost learning efficiency, and achieve 10‑25 % higher cumulative returns with minimal online interaction across MuJoCo and AntMaze benchmarks.

AntMazeENOTOEnsemble Q-Networks

0 likes · 16 min read

Ensemble-based Offline-to-Online Reinforcement Learning (ENOTO): Methodology, Experiments, and Analysis

Alimama Tech

Dec 4, 2024 · Artificial Intelligence

AIGB: Generative Auto‑Bidding via Diffusion Modeling

AIGB, introduced by Alibaba Mama in 2023, reframes large‑scale ad‑auction auto‑bidding as a generative sequence task using diffusion models, achieving up to 5 % GMV gains, improved stability and interpretability, and is now commercialized, open‑sourced, and featured in a NeurIPS‑endorsed competition.

AIauto-biddingdiffusion

0 likes · 12 min read

AIGB: Generative Auto‑Bidding via Diffusion Modeling

Model Perspective

Dec 3, 2024 · Artificial Intelligence

How Recommendation Algorithms Shape Our Habits—and What You Can Do About It

The article examines how recommendation algorithms reinforce user preferences, turning habits into stable feedback loops, and proposes mathematical models and practical strategies to introduce diversity and break behavioral fixation in the age of algorithmic personalization.

Diversitybehavioral modelinghabit formation

0 likes · 7 min read

How Recommendation Algorithms Shape Our Habits—and What You Can Do About It

DataFunTalk

Nov 30, 2024 · Artificial Intelligence

Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI

In this extensive interview, Rich Sutton critiques the focus on transient deep learning, advocates for continuous learning, discusses the reward hypothesis, outlines research challenges, offers advice to emerging scholars, and predicts breakthroughs in AI understanding by 2030‑2040.

AI researchcontinuous learningfuture of AI

0 likes · 27 min read

Interview with Rich Sutton on Continuous Learning, Reinforcement Learning, and the Future of AI

Python Programming Learning Circle

Nov 4, 2024 · Artificial Intelligence

Reinforcement Learning with highway‑env and Gym: DQN for Autonomous Driving

This tutorial explains how to install the gym and highway‑env packages, configure a highway simulation environment, process observations and actions, build a DQN network in PyTorch, train the agent, and analyze training results for autonomous driving scenarios.

DQNautonomous drivinggym

0 likes · 11 min read

Reinforcement Learning with highway‑env and Gym: DQN for Autonomous Driving

DaTaobao Tech

Oct 30, 2024 · Artificial Intelligence

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

The article explains how OpenAI’s o1 model leverages chain‑of‑thought prompting, dual‑system cognitive theory, and new scaling laws—pre‑training on code/math and post‑training reinforcement with step‑wise reward models—to achieve superior reasoning, safety, and performance over GPT‑4, heralding a shift toward models that learn to think.

Chain-of-ThoughtLLMSafety

0 likes · 42 min read

Understanding OpenAI o1: Chain‑of‑Thought, Scaling Laws, and Training Strategies

Baobao Algorithm Notes

Oct 21, 2024 · Artificial Intelligence

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide

This article provides a thorough, four‑part overview of RLHF for large language models, covering preference‑optimization algorithms (PPO‑based and offline RL approaches), reward‑model training techniques, inference‑time exploration strategies, and practical implementation details including the OpenRLHF framework and resource‑allocation tricks.

DPOLLM OptimizationOpenRLHF

0 likes · 27 min read

Unraveling RLHF: From PPO to DPO and Beyond – A Comprehensive Guide

Baobao Algorithm Notes

Oct 15, 2024 · Artificial Intelligence

How DPO Simplifies RLHF: A Deep Dive into Direct Preference Optimization

This article breaks down how Direct Preference Optimization (DPO) mathematically reduces the two‑stage RLHF pipeline into a single‑stage SFT process, explains the underlying loss transformations, and discusses DPO's practical limitations and trade‑offs for large language model alignment.

DPODirect Preference OptimizationRLHF

0 likes · 9 min read

How DPO Simplifies RLHF: A Deep Dive into Direct Preference Optimization

DataFunSummit

Sep 27, 2024 · Artificial Intelligence

Advances in Educational Large Language Models for Youth Programming and Personalized Learning

The presentation by Dr. Su Yu outlines challenges such as data sparsity and delayed learning effects in AI‑driven education, introduces three technical breakthroughs—domain‑specific LLM training, small‑knowledge learning via hierarchical knowledge graphs, and reinforcement‑based cognitive recommendation—and showcases product applications like the Frog Programming Platform, AI Programming Learning Machine, and digital‑human AI recorded courses.

AI EducationKnowledge GraphPersonalized Learning

0 likes · 18 min read

Advances in Educational Large Language Models for Youth Programming and Personalized Learning

Model Perspective

Sep 27, 2024 · Artificial Intelligence

Modeling Everyday Learning: From Reinforcement to Social Learning

The article explores how everyday decision‑making can be modeled using reinforcement learning and social learning frameworks, illustrating their strengths, limitations, and combined insights for understanding individual and collective behavior.

AIbehavioral modelingdecision making

0 likes · 8 min read

Modeling Everyday Learning: From Reinforcement to Social Learning

Architect

Sep 26, 2024 · Artificial Intelligence

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

This article provides a detailed technical analysis of OpenAI’s o1 model, exploring its enhanced logical reasoning, the likely use of reinforcement learning with hidden chain‑of‑thought generation, multi‑model architecture, training data pipelines, reward modeling, and how these innovations could reshape AI safety and scaling strategies.

AI safetyChain-of-ThoughtLLM

0 likes · 43 min read

Decoding OpenAI o1: How RL‑LLM Fusion Powers Next‑Gen Reasoning

JD Tech Talk

Sep 23, 2024 · Artificial Intelligence

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

The JD Advertising R&D team applies cutting‑edge AI techniques—including query intent models, multimodal representation pipelines, reinforcement‑learning‑based auction mechanisms, generative recommendation with quantized product tokens, and large‑model infrastructure—to boost traffic valuation, ad relevance, revenue, and creative generation across the platform.

AIAdvertisingGraph Neural Networks

0 likes · 19 min read

JD Advertising R&D: AI‑Driven Solutions for Traffic Valuation, Multimodal Understanding, Auction Mechanisms, Generative Recommendation, and Large‑Model Engineering

Data Thinking Notes

Sep 13, 2024 · Artificial Intelligence

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

OpenAI’s new o1 series, including o1‑preview and o1‑mini, leverages reinforcement‑learning‑based chain‑of‑thought reasoning to achieve superior performance on academic exams, coding contests, and safety benchmarks, offering faster, cost‑effective options while advancing AI alignment and human‑preference evaluation.

AI safetyLarge Language ModelOpenAI

0 likes · 15 min read

How OpenAI’s o1 Series Redefines Complex Reasoning and AI Safety

Tencent Advertising Technology

Aug 15, 2024 · Artificial Intelligence

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

This paper introduces RLLR, a label‑sensitive reward reinforcement learning method that improves natural language understanding tasks by aligning training objectives with label accuracy, and demonstrates its effectiveness across eight public NLU datasets and real‑world advertising feature evaluation, outperforming standard RLHF and SFT baselines.

AdvertisingNatural Language UnderstandingRLHF

0 likes · 14 min read

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

Tencent Advertising Technology

Aug 13, 2024 · Artificial Intelligence

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

This paper investigates selection bias in large language models for multiple‑choice tasks, proposes metrics to quantify symbol‑content binding, introduces Reweighting Symbol‑Content Binding (RSCB) and Point‑wise Intelligent Feedback (PIF) methods, and demonstrates their effectiveness in reducing bias and improving accuracy, including a real‑world Tencent advertising feature‑evaluation deployment.

Symbol Bindingmultiple choicepointwise feedback

0 likes · 16 min read

Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors

Model Perspective

Jul 31, 2024 · Artificial Intelligence

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Monte Carlo Tree Search (MCTS) is a statistical heuristic algorithm that builds decision trees through selection, expansion, simulation, and backpropagation, enabling breakthroughs like AlphaGo’s victory and finding applications in robotics, autonomous driving, finance, and bioinformatics.

AI ApplicationsAlphaGoMCTS

0 likes · 7 min read

How Monte Carlo Tree Search Powers AlphaGo and Beyond: A Deep Dive

Model Perspective

Jul 30, 2024 · Artificial Intelligence

Your Complete AI Learning Roadmap: From Basics to Large Model Mastery

This guide presents a comprehensive AI learning roadmap, dividing study into five progressive stages—from foundational math and programming to core deep‑learning and reinforcement‑learning techniques, large‑model training, industry applications, and future trends—plus curated book lists, tool recommendations, and practical RAG tutorials.

AI learning roadmapAI resourcesRAG

0 likes · 9 min read

Your Complete AI Learning Roadmap: From Basics to Large Model Mastery

Alimama Tech

Jul 29, 2024 · Artificial Intelligence

Generative Auto-bidding via Diffusion Modeling (AIGB)

The paper presents AIGB, a generative auto‑bidding framework that replaces reinforcement‑learning with a conditional diffusion model to generate optimal bidding trajectories, and demonstrates through offline benchmarks and Alibaba’s online A/B tests that it consistently outperforms RL baselines, boosting buy count, GMV, and ROI while maintaining low latency.

Marketing AIOnline Advertisingauto-bidding

0 likes · 18 min read

Generative Auto-bidding via Diffusion Modeling (AIGB)

php Courses

Jul 29, 2024 · Artificial Intelligence

Building Reinforcement Learning Algorithms with PHP

This article explains the fundamentals of reinforcement learning, demonstrates how PHP can be used with neural‑network libraries such as Keras or TensorFlow to implement a simple reinforcement‑learning agent, provides a complete PHP code example, and discusses its potential applications.

AIcode examplereinforcement learning

0 likes · 5 min read

Building Reinforcement Learning Algorithms with PHP

Kuaishou Tech

Jul 17, 2024 · Artificial Intelligence

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

The article details Kuaishou’s development of the 175B “Kuaiyi” multimodal large model, presenting eight novel technical innovations—from Temporal Scaling Law and MiLe Loss to MoE‑enhanced reward modeling—and describes how these advances enable high‑performance AI services such as the AI Xiao Kuai chatbot across diverse real‑world scenarios.

AI ApplicationsLarge Language ModelModel Optimization

0 likes · 12 min read

Key Technical Innovations in Kuaishou’s “Kuaiyi” Large Model and Its Real-World Applications

Alimama Tech

Jul 15, 2024 · Artificial Intelligence

Why Auto‑Bidding in Large‑Scale Auctions Is the Hottest NeurIPS Challenge

The article explains how NeurIPS ranks among top AI conferences, introduces the newly selected “Auto‑Bidding in Large‑Scale Auctions” competition, outlines its technical background, four generations of bidding strategies—from classic control to generative models—and details the competition’s tracks, rewards, and how researchers can participate.

AdvertisingGenerative AINeurIPS

0 likes · 12 min read

Why Auto‑Bidding in Large‑Scale Auctions Is the Hottest NeurIPS Challenge

DataFunSummit

Jul 8, 2024 · Artificial Intelligence

World Models and Causal Inference in Reinforcement Learning: A Comprehensive Overview

This article reviews the role of world (mental) models and causal inference in reinforcement learning, covering their theoretical foundations, model‑based RL frameworks such as Dyna, sample‑efficiency challenges, causal structure learning, distribution correction, dynamics‑reward modeling, and experimental results that demonstrate performance gains across multiple tasks.

causal inferencemodel-based RLreinforcement learning

0 likes · 21 min read

World Models and Causal Inference in Reinforcement Learning: A Comprehensive Overview

Ops Development & AI Practice

Jun 22, 2024 · Artificial Intelligence

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Machine learning, a core AI discipline, encompasses traditional algorithms like supervised, unsupervised, and reinforcement learning as well as neural network models such as CNNs, RNNs, GANs, and VAEs, each with distinct principles, strengths, and typical application scenarios.

Traditional Algorithmsdeep learningmachine learning

0 likes · 10 min read

Machine Learning Demystified: Traditional Algorithms vs Neural Networks

Baobao Algorithm Notes

May 30, 2024 · Artificial Intelligence

What’s the Latest RLHF Landscape? From PPO to ORPO Explained

This article surveys the current RLHF ecosystem, comparing on‑policy methods like PPO with off‑policy approaches such as DPO, and examines recent variants—including ReMax, GRPO, DPOP, TDPO, and ORPO—highlighting their algorithmic differences, resource trade‑offs, and practical performance insights.

DPOLLMPPO

0 likes · 23 min read

What’s the Latest RLHF Landscape? From PPO to ORPO Explained

Kuaishou Tech

May 27, 2024 · Artificial Intelligence

What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models

The 62nd ACL conference accepted four papers from Kuaishou that explore multi‑turn instruction following, self‑agreement reasoning, fine‑grained reinforcement learning, and dynamic routing in Mixture‑of‑Experts models, each with detailed methods, experimental results, author lists, and public arXiv links.

ACL 2024Kuaishou ResearchMixture of Experts

0 likes · 11 min read

What Kuaishou’s Four ACL Papers Reveal About the Future of Large Language Models

NewBeeNLP

May 13, 2024 · Artificial Intelligence

Why DPO Treats LLMs as Q‑Functions: A Deep Theoretical Dive

This article offers a detailed theoretical interpretation of the DPO algorithm, showing how large language models can be viewed as Q‑functions, unifying sequence‑wise and step‑wise decision perspectives, and discussing the resulting implications for reinforcement‑learning‑based alignment research.

DPOLLMPreference Optimization

0 likes · 14 min read

Why DPO Treats LLMs as Q‑Functions: A Deep Theoretical Dive

DataFunSummit

Apr 16, 2024 · Artificial Intelligence

Intelligent Risk Control: Definitions, Expert Systems, Algorithmic Systems, and Emerging AI Techniques

This article explains intelligent risk control as a synergy of expert experience and algorithmic decision‑making, outlines its definition, expert human systems, digital algorithmic systems, and explores advanced AI methods such as reinforcement learning, large language models with knowledge graphs, adversarial learning, graph neural networks, and a practical supply‑chain case study.

Graph Neural NetworkKnowledge GraphLarge Language Model

0 likes · 11 min read

Intelligent Risk Control: Definitions, Expert Systems, Algorithmic Systems, and Emerging AI Techniques

JD Retail Technology

Apr 15, 2024 · Artificial Intelligence

Design and Evolution of JD.com Recommendation Advertising Ranking Auction Mechanism

The article analyzes JD.com's recommendation advertising ranking auction mechanism, detailing its objectives, challenges in traffic value estimation, user interest exploration, and multi‑item auction fairness, and describing the technical evolution from traditional auctions to deep‑learning‑driven solutions.

AdvertisingRankingauction

0 likes · 18 min read

DataFunTalk

Apr 4, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

This article reviews the challenges of textual‑only large language model interaction, introduces benchmark environments such as AFL World and ScienceWorld, compares baseline reinforcement‑learning approaches, and presents SwiftSage—a hybrid system that combines a fast T5‑based small model with a powerful LLM for planning and grounding, demonstrating superior performance, efficiency, and cost‑effectiveness while outlining current limitations and future research directions.

AISwiftSageinteractive agents

0 likes · 22 min read

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

DataFunSummit

Apr 2, 2024 · Artificial Intelligence

Reinforcement Learning: Fundamentals, Classic Algorithms, and Applications in Short Video Recommendation

This article provides an in-depth overview of reinforcement learning, covering its goals, mathematical foundations such as Markov Decision Processes, classic algorithms like DQN, and practical applications including short‑video recommendation systems that aim to improve user retention through RL‑based ranking.

DQNMarkov Decision ProcessRL applications

0 likes · 12 min read

Reinforcement Learning: Fundamentals, Classic Algorithms, and Applications in Short Video Recommendation

DataFunTalk

Mar 30, 2024 · Artificial Intelligence

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

This talk presents Kuaishou's research on combining reinforcement learning with multi‑task recommendation, detailing a two‑stage constrained actor‑critic method for short‑video ranking, a multi‑task RL framework, experimental results on offline and online systems, and practical Q&A insights.

Kuaishouactor-criticmulti-task recommendation

0 likes · 18 min read

Reinforcement Learning and Multi‑Task Recommendation: Two‑Stage Constrained Actor‑Critic and Multi‑Task RL Approaches at Kuaishou

Python Programming Learning Circle

Mar 28, 2024 · Artificial Intelligence

Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

This article explains how to install the gym and highway‑env packages, configure the environment for various driving scenarios, define observations, actions and rewards, build a DQN network in PyTorch, run the training loop, and analyze the resulting performance metrics.

DQNPythonSimulation

0 likes · 9 min read

Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

Model Perspective

Mar 8, 2024 · Artificial Intelligence

Master the Three Machine Learning Types and Model Paradigms

This article introduces the three core machine learning categories—supervised, unsupervised, and reinforcement learning—detailing their definitions, typical algorithms, and real‑world applications, and then compares generative and discriminative models, highlighting key examples, characteristics, and use‑case differences.

Discriminative Modelsgenerative modelsmachine learning

0 likes · 13 min read

Master the Three Machine Learning Types and Model Paradigms

DataFunTalk

Mar 7, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis

This article reviews recent advances in using large language models for interactive embodied agents, introduces the SwiftSage dual‑model framework that combines a fast T5‑based small model with a powerful LLM for planning, evaluates it on benchmarks such as AFL World and ScienceWorld, and discusses efficiency, cost‑effectiveness, limitations, and future research directions.

AISwiftSageinteractive agents

0 likes · 23 min read

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework and Benchmark Analysis

php Courses

Feb 22, 2024 · Artificial Intelligence

Building Reinforcement Learning Algorithms with PHP

This article introduces reinforcement learning, explains its core concepts, and demonstrates how to implement a simple reinforcement learning algorithm in PHP using neural‑network libraries such as Keras, providing a complete code example that includes environment and agent classes.

PHPartificial-intelligencecode example

0 likes · 4 min read

DaTaobao Tech

Jan 31, 2024 · Artificial Intelligence

Highlights of Recent AI Research Papers from Top Conferences (2023)

The article curates standout AI papers from 2023 CCF‑A conferences—including CVPR, ICLR, ACM MM, and INFORMS—showcasing advances such as Swin‑Transformer video quality assessment, cross‑modal e‑commerce product search, transformer‑based vehicle routing heuristics, diffusion‑driven dance generation, and reinforcement‑learning inventory replenishment.

AIcomputer visionmultimedia

0 likes · 23 min read

Highlights of Recent AI Research Papers from Top Conferences (2023)

DataFunSummit

Jan 27, 2024 · Artificial Intelligence

Enhancing Interactive Agents with Large Language Models: The SwiftSage Framework

This article reviews recent advances in using large language models for embodied interactive agents, introduces the dual‑modality SwiftSage architecture that combines a fast T5‑based small model with a powerful large model for planning and grounding, and evaluates its performance on benchmarks such as ScienceWorld.

AI2PlanningSwiftSage

0 likes · 23 min read

DataFunTalk

Jan 25, 2024 · Artificial Intelligence

World Models, Reinforcement Learning, and Causal Inference: A Comprehensive Overview

This article presents a detailed overview of world models and their role in reinforcement learning, explains how causal inference can enhance model-based RL, discusses sample efficiency challenges, and shares experimental findings and practical insights from recent research and industry applications.

AIcausal inferencemachine learning

0 likes · 22 min read

World Models, Reinforcement Learning, and Causal Inference: A Comprehensive Overview

Alimama Tech

Jan 10, 2024 · Artificial Intelligence

Advances in Automated Bidding and Auction Mechanisms for Online Advertising

Advances in automated bidding for online ads have progressed from classic control and linear programming to reinforcement‑learning pipelines, offline and sustainable online RL, and finally generative‑model approaches, each enhancing decision strength, adaptability, and fairness while addressing simulation gaps, multi‑objective constraints, and real‑time efficiency.

Auction DesignGenerative AIOnline Advertising

0 likes · 25 min read

Advances in Automated Bidding and Auction Mechanisms for Online Advertising

DataFunSummit

Dec 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework

This article presents a two‑stage constrained actor‑critic (TSCAC) algorithm that models short‑video recommendation as a constrained reinforcement‑learning problem, details its theoretical formulation and optimization loss, and validates its superiority through extensive offline and online experiments, followed by a multi‑task reinforcement‑learning framework (RMTL) that further improves multi‑objective recommendation performance.

Multi-Task LearningRecommendation Systemsconstrained optimization

0 likes · 16 min read

Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework

Python Programming Learning Circle

Dec 16, 2023 · Artificial Intelligence

Using highway‑env with OpenAI Gym for Reinforcement Learning: Installation, Configuration, and DQN Training

This tutorial explains how to install the gym and highway‑env packages, configure the highway‑v0 environment, explore its observation types, and implement a DQN agent in Python to train and evaluate autonomous driving policies, complete with code snippets and performance visualizations.

DQNPythongym

0 likes · 9 min read

Using highway‑env with OpenAI Gym for Reinforcement Learning: Installation, Configuration, and DQN Training

Alibaba Cloud Big Data AI Platform

Dec 8, 2023 · Artificial Intelligence

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

BeautifulPrompt, presented at EMNLP 2023, introduces a deep generation model that automatically crafts high-quality prompts from simple image descriptions, enhancing text-to-image synthesis through data-driven fine‑tuning, reward modeling, and reinforcement learning techniques.

AI generationreinforcement learningtext-to-image synthesis

0 likes · 8 min read

How BeautifulPrompt Automates Prompt Engineering for Text-to-Image Generation

Sohu Tech Products

Dec 6, 2023 · Artificial Intelligence

Real-time Controllable Multi-Objective Re-ranking Models for Taobao Feed Recommendation

The paper introduces a real‑time controllable, multi‑objective re‑ranking framework for Taobao’s feed recommendation that combines actor‑critic reinforcement learning with hypernetworks to instantly adjust objective weights, handling diverse media and cold‑start constraints while delivering higher click‑through, diversity, and cold‑start ratios with only 20‑25 ms latency.

AlibabaRecommendation SystemsRe‑ranking

0 likes · 34 min read

Real-time Controllable Multi-Objective Re-ranking Models for Taobao Feed Recommendation

Kuaishou Tech

Dec 1, 2023 · Artificial Intelligence

Short Video Recommendation Algorithm Frontier Research Forum at CCIR 2023

The CCIR 2023 conference in Beijing, sponsored by Kuaishou, hosted a short‑video recommendation algorithm frontier research forum where over 100 experts and students shared the latest AI‑driven recommendation technologies, open datasets, and interdisciplinary challenges in short‑video platforms.

AIconferencedatasets

0 likes · 8 min read

Short Video Recommendation Algorithm Frontier Research Forum at CCIR 2023

Alimama Tech

Nov 28, 2023 · Artificial Intelligence

Evolution of Alibaba's AI-Driven Advertising Decision Technologies

The article traces Alibaba’s Alimama platform from classic control‑based bidding through linear programming and reinforcement‑learning approaches to generative‑AI‑driven strategies, detailing how deep‑learning models, offline and sustainable online RL frameworks, and large‑language‑model‑based bidding reshape automated auctions, fairness, and scalability in e‑commerce advertising.

AIAuction Designauto-bidding

0 likes · 38 min read

Evolution of Alibaba's AI-Driven Advertising Decision Technologies

21CTO

Nov 27, 2023 · Artificial Intelligence

What Is the Mysterious Q* Model and Could It Redefine AI?

A speculative look at OpenAI's rumored Q* project explores its possible blend of Q‑learning and A* search, the potential for advanced logical reasoning, and the broader philosophical questions about AI consciousness, alignment, and the future of intelligent systems.

AI alignmentAI consciousnessOpenAI

0 likes · 9 min read

What Is the Mysterious Q* Model and Could It Redefine AI?

Kuaishou Tech

Nov 23, 2023 · Artificial Intelligence

KuaiSim: A Comprehensive User Simulator for Reinforcement Learning in Recommendation Systems

KuaiSim is a comprehensive user simulation environment for recommendation systems that models immediate, long‑term, and cross‑session feedback, supports list‑wise, whole‑session, and retention tasks, provides baselines and evaluation metrics, and demonstrates superior performance on KuaiRand and ML‑1M datasets.

KuaiSimUser Simulationbenchmark

0 likes · 14 min read

KuaiSim: A Comprehensive User Simulator for Reinforcement Learning in Recommendation Systems

DataFunSummit

Nov 21, 2023 · Artificial Intelligence

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

This article presents an in‑depth overview of Tencent's TRS automatic hyperparameter tuning, covering background, challenges, the evolution from Bayesian optimization to evolution strategies and reinforcement learning, a systematic platform solution, real‑world deployment results, and a Q&A session.

Bayesian OptimizationEvolution Strategieshyperparameter tuning

0 likes · 20 min read

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

Kuaishou Tech

Nov 21, 2023 · Artificial Intelligence

Kuaishou Academic Forum on Cutting-Edge Short Video Recommendation Algorithms (Nov 23, 2023)

The Kuaishou Academic Forum held on November 23 in Beijing presented cutting‑edge research on short‑video recommendation algorithms, featuring talks on reinforcement learning, user interest modeling, graph neural networks, and a comprehensive recommender‑system simulator, while also offering registration details and a brief company overview.

Graph Neural NetworksKuaishouacademic forum

0 likes · 5 min read

Kuaishou Academic Forum on Cutting-Edge Short Video Recommendation Algorithms (Nov 23, 2023)

DataFunTalk

Nov 14, 2023 · Artificial Intelligence

Real-Time Controllable Multi-Objective Re‑ranking for Taobao Feed

This article presents a comprehensive study of a controllable multi‑objective re‑ranking model for Taobao's information‑flow recommendation, detailing the challenges of complex feed scenarios, three modeling paradigms (V1‑V3), an actor‑critic reinforcement learning framework with hypernet‑generated weights, and extensive online evaluation results.

Recommendation SystemsRe‑rankinghypernetworks

0 likes · 31 min read

Real-Time Controllable Multi-Objective Re‑ranking for Taobao Feed

Sohu Tech Products

Nov 8, 2023 · Artificial Intelligence

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework

The presentation introduces a two‑stage constrained actor‑critic algorithm that learns auxiliary policies for interaction signals before optimizing watch‑time under KL constraints, and a reinforcement‑learning multi‑task learning framework that models session‑level dynamics with adaptive multi‑critic weighting, both achieving significant offline and online gains in short‑video recommendation.

Multi-Task LearningRecommendation Systemsactor-critic

0 likes · 16 min read

Two‑Stage Constrained Actor‑Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Recommendation Framework

DataFunTalk

Nov 6, 2023 · Artificial Intelligence

Two‑Stage Constrained Actor‑Critic Reinforcement Learning for Short‑Video Recommendation and a Multi‑Task RL Framework

This article presents a two‑stage constrained actor‑critic reinforcement learning algorithm for short‑video recommendation, models the problem as a constrained MDP, details the algorithm’s stages, and reports extensive offline and online experiments showing superior watch‑time and interaction metrics, followed by a multi‑task RL framework and its evaluations.

Recommendation Systemsconstrained optimizationmulti‑task learning

0 likes · 16 min read

Two‑Stage Constrained Actor‑Critic Reinforcement Learning for Short‑Video Recommendation and a Multi‑Task RL Framework

NetEase Smart Enterprise Tech+

Oct 19, 2023 · Artificial Intelligence

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

NetEase’s AI team reveals how their self‑developed distributed reinforcement‑learning platform, Bray, enables high‑level AI agents for the MOBA game Dream of Kingdom 2, covering GameCore integration, weighted random initialization, modular APIs, difficulty scaling, and cost‑effective training for realistic player experiences.

AI FrameworkMoBAdistributed training

0 likes · 9 min read

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

Zhuanzhuan Tech

Oct 18, 2023 · Artificial Intelligence

Design and Implementation of a Home‑Page Recommendation System Using Reinforcement Learning and DPP

This article presents a comprehensive design for Zhuanzhuan's home‑page recommendation pipeline, detailing the system architecture, challenges of traffic efficiency and diversity, and a two‑stage solution that applies Proximal Policy Optimization reinforcement learning in the re‑ranking module and Determinantal Point Process optimization in the coarse‑ranking and traffic‑pool stages, followed by offline simulation, online deployment, and evaluation metrics.

DPPRankingmachine learning

0 likes · 18 min read

Design and Implementation of a Home‑Page Recommendation System Using Reinforcement Learning and DPP

Alimama Tech

Oct 11, 2023 · Artificial Intelligence

How Minimax Regret Optimization Tackles Black‑Box Adversarial Bidding Constraints

This article explains how the Alibaba‑Mama team addresses constrained ROI bidding in a black‑box adversarial environment by introducing a Minimax Regret Optimization framework that aligns training and test distributions, builds a causal world model, and demonstrates robust performance on synthetic and real‑world ad auctions.

Online Advertisingadversarial biddingconstrained optimization

0 likes · 14 min read

How Minimax Regret Optimization Tackles Black‑Box Adversarial Bidding Constraints

Baobao Algorithm Notes

Oct 9, 2023 · Artificial Intelligence

Demystifying RLHF and PPO for Large Language Models: Theory and Practice

This article explains why Reinforcement Learning from Human Feedback (RLHF) is crucial for LLM intelligence, outlines the three-stage training pipeline, details InstructGPT's reward model and PPO optimization, and provides a practical guide to implementing RLHF with deep‑learning frameworks.

PPORLHFReward Modeling

0 likes · 17 min read

Demystifying RLHF and PPO for Large Language Models: Theory and Practice

Alibaba Cloud Big Data AI Platform

Sep 13, 2023 · Artificial Intelligence

How Pai‑Megatron‑Patch Accelerates Large Language Model Training on Alibaba Cloud

This article introduces Pai‑Megatron‑Patch, an open‑source tool from Alibaba Cloud that streamlines large language model (LLM) training, weight conversion, FP8 mixed‑precision acceleration, and reinforcement‑learning workflows, providing detailed architecture, key features, code examples, and step‑by‑step usage instructions.

FP8LLM trainingMegatron

0 likes · 19 min read

How Pai‑Megatron‑Patch Accelerates Large Language Model Training on Alibaba Cloud

Alimama Tech

Aug 23, 2023 · Artificial Intelligence

Reinforcement Learning for Pacing in Preloaded Ads (RLTP)

The paper introduces RLTP, a reinforcement‑learning‑based pacing system that models delayed‑impression preloaded ads as an MDP, uses a dueling DQN to select traffic probabilities, and simultaneously meets exposure targets, ensures smooth delivery, and maximizes CTR, outperforming rule‑based and PID baselines while removing complex multi‑stage pipelines.

Online AdvertisingRLTPad pacing

0 likes · 16 min read

Reinforcement Learning for Pacing in Preloaded Ads (RLTP)

ByteDance SE Lab

Aug 21, 2023 · Artificial Intelligence

How Fastbot Uses Reinforcement Learning for Faster Android GUI Testing

Fastbot is a reusable, model‑based Android GUI testing tool that leverages reinforcement‑learning techniques to learn from previous test runs, accelerating coverage and crash detection through a two‑phase workflow, probabilistic and learning‑based event selection, and provides configurable custom events, widget blocking, and tree‑pruning features.

GUI automationandroid testingfastbot

0 likes · 16 min read

How Fastbot Uses Reinforcement Learning for Faster Android GUI Testing

Python Crawling & Data Mining

Aug 20, 2023 · Artificial Intelligence

What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning

This article explains Reinforcement Learning with Human Feedback (RLHF), outlining its definition, suitable tasks, advantages over other reward‑model methods, types of algorithms, challenges of human feedback, and practical strategies to mitigate its limitations for building robust AI systems.

AI alignmentHuman FeedbackReward Modeling

0 likes · 14 min read

What Is RLHF? Benefits, Limits, and Design Tips for Human‑Feedback Reinforcement Learning

Alimama Tech

Aug 16, 2023 · Artificial Intelligence

Personalized Automated Bidding Framework (PerBid) for Fairness‑Aware Online Advertising

PerBid introduces a personalized automated bidding framework that creates context‑aware RL agents for advertiser clusters using a profiling network to embed static and dynamic campaign features, and experiments on Alibaba’s display‑ad platform show up to 10.85% performance gains while markedly improving fairness across heterogeneous advertisers.

Online Advertisingautomated biddingfairness

0 likes · 23 min read

Personalized Automated Bidding Framework (PerBid) for Fairness‑Aware Online Advertising

Baidu Geek Talk

Aug 16, 2023 · Artificial Intelligence

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

This article provides a comprehensive overview of reinforcement learning, covering fundamental concepts, differences from supervised learning, algorithm families, policy gradient methods, practical tricks like baselines and reward‑to‑go, and detailed explanations of TRPO and PPO variants with illustrative diagrams.

PPOactor-criticmachine learning

0 likes · 19 min read

Understanding Reinforcement Learning: From Basics to PPO and Policy Gradient

DataFunTalk

Aug 7, 2023 · Artificial Intelligence

DataFun Decision Intelligence Summit – Reinforcement Learning Forum Overview

The DataFun Decision Intelligence Summit brings together leading researchers and industry experts to present cutting‑edge reinforcement learning algorithms, safety considerations, distributional methods, and real‑world applications such as vehicle routing, recommender systems, and power‑grid scheduling, highlighting future directions and audience takeaways.

AIRecommendation Systemsdistributional RL

0 likes · 12 min read

DataFun Decision Intelligence Summit – Reinforcement Learning Forum Overview

Python Programming Learning Circle

Aug 5, 2023 · Artificial Intelligence

Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

This article explains how to install gym and highway‑env, configure the environment, process state, action and reward data, build a DQN model in PyTorch, run training loops, and analyze results for autonomous driving simulations using reinforcement learning.

DQNautonomous drivinggym

0 likes · 10 min read

Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation

Meituan Technology Team

Jul 20, 2023 · Artificial Intelligence

Novelty Recommendation for Meituan Food Delivery: System Design, Challenges, and Solutions

Meituan’s food‑delivery team built a novelty‑focused recommendation pipeline—combining dual‑tower recall, novelty‑aware ranking, personalized mixed‑ranking weights, and reinforcement‑learning insertion—to surface merchants unseen by users, achieving 19% higher exposure novelty, 25% more order novelty, and improved ratings while keeping RPM loss under 0.5%.

Rankingfood deliverynovelty

0 likes · 28 min read

Novelty Recommendation for Meituan Food Delivery: System Design, Challenges, and Solutions

DataFunSummit

Jun 19, 2023 · Artificial Intelligence

Overview of Decision Intelligence and Reinforcement Learning

This article provides a comprehensive overview of decision intelligence, distinguishing predictive and decision tasks, classifies decision environments, and delves into reinforcement learning fundamentals, algorithms such as SARSA, deep reinforcement learning, and discusses current applications, challenges, and future research directions.

Decision IntelligenceOptimizationartificial-intelligence

0 likes · 12 min read

Overview of Decision Intelligence and Reinforcement Learning

Tencent Tech

Jun 14, 2023 · Artificial Intelligence

How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

Tencent Robotics X unveiled how its robot dog Max combines pre‑trained AI models with reinforcement learning across three learning stages, enabling it to acquire, store, and apply skills for autonomous decision‑making in complex tasks such as the World Chase Tag competition.

AIPre‑trainingSimulation

0 likes · 6 min read

How Tencent’s Robot Dog Max Gains Human‑Like Decision‑Making with Pre‑trained AI and RL

DaTaobao Tech

Jun 9, 2023 · Artificial Intelligence

Generator-Evaluator Architecture for End-to-End Re-ranking in Information Flow

The paper introduces a Generator‑Evaluator (GE) architecture that end‑to‑end re‑ranks information‑flow items using a pointer‑network seq2seq generator and a reward‑estimating evaluator, jointly optimizing relevance and business utilities such as diversity, traffic control, inter‑group ordering, and fixed‑slot insertion, achieving over 70% better‑percentage and significant online gains on Taobao.

Information FlowRankinggenerator-evaluator

0 likes · 19 min read

Generator-Evaluator Architecture for End-to-End Re-ranking in Information Flow

Network Intelligence Research Center (NIRC)

Jun 9, 2023 · Artificial Intelligence

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

In 2023 the Network Intelligent Research Center celebrated its largest PhD graduating class—seven scholars whose dissertations span deep‑vision hand‑gesture estimation, multi‑scenario network transmission, graph alignment, interactive streaming, knowledge‑defined networking, wireless body‑area networking, and more—showcasing significant AI‑driven advances and high‑impact publications.

Graph AlignmentNetwork IntelligenceWireless Networks

0 likes · 30 min read

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

Didi Tech

May 23, 2023 · Artificial Intelligence

Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques

The article surveys Didi’s driver‑passenger matching challenges and presents a suite of solutions—from greedy nearest‑driver and Kuhn‑Munkres bipartite matching to stable marriage, dynamic and one‑to‑many assignments, reinforcement‑learning, routing and queueing models—while validating assumptions statistically, integrating preference‑aware machine learning, and outlining multi‑objective and digital‑twin future research.

OptimizationRide Hailingalgorithm

0 likes · 23 min read

Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques

DataFunTalk

May 20, 2023 · Artificial Intelligence

Understanding Didi’s Online Marketplace: Core Concepts, Technical Challenges, and Emerging Technologies

This article introduces Didi’s real‑time online marketplace, explains its fundamental principles, network effects, and social efficiency benefits, and examines key technical areas such as mechanism design, decision intelligence, operations research, reinforcement learning, and causal inference that drive its advanced matching and dispatch strategies.

Decision Intelligenceartificial-intelligencemarketplace

0 likes · 16 min read

Understanding Didi’s Online Marketplace: Core Concepts, Technical Challenges, and Emerging Technologies

Rare Earth Juejin Tech Community

May 8, 2023 · Artificial Intelligence

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

This article explains how ChatGPT works by covering the fundamentals of natural language processing, generative language models, deep learning, the Transformer architecture, attention mechanisms, few‑shot learning, and the reinforcement‑learning techniques that align its outputs with human preferences.

AIChatGPTLarge Language Model

0 likes · 24 min read

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

Kuaishou Tech

Apr 29, 2023 · Artificial Intelligence

RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

The paper proposes RMTL, a reinforcement‑learning driven multi‑task learning framework that builds session‑level MDPs, trains a multi‑task actor‑critic network with dynamic loss weighting, and demonstrates significant AUC improvements over state‑of‑the‑art MTL recommendation models on public datasets.

Multi-Task Learningactor‑criticadaptive loss weighting

0 likes · 8 min read

RMTL: A Reinforcement Learning Based Multi‑Task Learning Framework for Session‑Level Recommendation

Kuaishou Tech

Apr 28, 2023 · Artificial Intelligence

How Hyper‑Actor Critic Redefines Reinforcement Learning for Recommendation Systems

This article presents the Hyper‑Actor Critic (HAC) framework that splits reinforcement‑learning policies into continuous hyper‑actions and effective recommendation lists, introduces alignment and supervised losses, and demonstrates superior performance on an online simulator compared to existing RL and supervised methods.

AI researchRecommendation Systemshyper-actor critic

0 likes · 9 min read

How Hyper‑Actor Critic Redefines Reinforcement Learning for Recommendation Systems

Kuaishou Tech

Apr 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

The paper models short‑video recommendation as a constrained Markov decision process and introduces a two‑stage constrained actor‑critic algorithm that jointly maximizes watch time while satisfying multiple interaction constraints, demonstrating superior offline and online performance on the KuaiRand dataset and Kuaishou app.

actor-criticconstrained optimizationoffline evaluation

0 likes · 7 min read

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

Kuaishou Tech

Apr 22, 2023 · Artificial Intelligence

Reinforcement Learning for User Retention (RLUR) in Short Video Recommendation Systems

This paper presents RLUR, a reinforcement‑learning algorithm that models user‑retention optimization as an infinite‑horizon request‑based Markov Decision Process, addressing uncertainty, bias, and delayed reward challenges to directly improve retention, DAU, and engagement in short‑video recommendation platforms.

KuaishouRLURUser Retention

0 likes · 8 min read

Reinforcement Learning for User Retention (RLUR) in Short Video Recommendation Systems

Alimama Tech

Apr 3, 2023 · Artificial Intelligence

AI-Generated Bidding (AIGB): Using Generative Models for Automated Advertising Bidding

AI‑Generated Bidding (AIGB) replaces reinforcement‑learning with a conditional generative model that learns the joint distribution of bids, objectives and constraints from historical trajectories, enabling interpretable, diverse, constraint‑aware bidding strategies that improve efficiency, scalability and explainability for large‑scale advertising platforms.

Generative AIOnline Advertisingautomated bidding

0 likes · 15 min read

AI-Generated Bidding (AIGB): Using Generative Models for Automated Advertising Bidding

Kuaishou Tech

Mar 29, 2023 · Artificial Intelligence

ResAct: A Reinforcement Learning Approach for Long-Term User Retention in Sequential Recommendation

The paper introduces ResAct, a reinforcement‑learning framework that improves long‑term user retention in sequential recommendation by constraining the policy space near the online‑serving policy and employing a conditional variational auto‑encoder, residual actor, and state‑action value network, achieving significant gains over existing methods on a large‑scale short‑video dataset.

ResActUser Retentionreinforcement learning

0 likes · 9 min read

ResAct: A Reinforcement Learning Approach for Long-Term User Retention in Sequential Recommendation

Python Programming Learning Circle

Mar 27, 2023 · Artificial Intelligence

Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

This article demonstrates how to install and configure the highway‑env reinforcement‑learning environment, set up a DQN agent in Python, and train it on various traffic scenarios, providing code examples and performance visualizations.

DQNPythonSimulation

0 likes · 10 min read

Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python

NetEase Smart Enterprise Tech+

Mar 27, 2023 · Artificial Intelligence

How Reinforcement Learning Powers AI Bots in ‘Barbarian Battle 2’

This article details NetEase Zhiji and Dianhun Network's use of reinforcement learning, a distributed training framework, and middleware to create, train, deploy, and iterate AI robots for the game "Barbarian Battle 2", highlighting technical challenges, solutions, and the impact on player experience.

AI botsGame Developmentdistributed training

0 likes · 13 min read

How Reinforcement Learning Powers AI Bots in ‘Barbarian Battle 2’

Python Programming Learning Circle

Mar 10, 2023 · Artificial Intelligence

Google's i‑S2R and GoalsEye: Robot Table‑Tennis Learning from Human Interaction

The article explains how Google's i‑S2R and GoalsEye projects use iterative simulation‑to‑real training, behavior cloning and goal‑conditioned learning to enable robots to play table‑tennis with humans, highlighting the challenges, experimental setup, and performance improvements achieved across player skill levels.

AI researchBehavior Cloninghuman-robot interaction

0 likes · 6 min read

Google's i‑S2R and GoalsEye: Robot Table‑Tennis Learning from Human Interaction

Top Architect

Mar 10, 2023 · Artificial Intelligence

Understanding InstructGPT and ChatGPT: Architecture, Training Pipeline, and Performance Analysis

This article provides a comprehensive overview of the GPT series, explains the differences between prompt learning and instruction learning, details the three‑stage training pipeline of InstructGPT/ChatGPT—including supervised fine‑tuning, reward‑model training, and PPO‑based reinforcement learning—examines their strengths, weaknesses, and future research directions, and discusses the broader impact of these models on AI development.

AIChatGPTGPT

0 likes · 22 min read

Understanding InstructGPT and ChatGPT: Architecture, Training Pipeline, and Performance Analysis

21CTO

Feb 23, 2023 · Artificial Intelligence

How Does ChatGPT Really Work? Inside the RLHF Training Process

This article explains ChatGPT’s architecture, the distinction between model capability and consistency, how next‑token and masked‑language‑model training lead to inconsistencies, and how OpenAI’s supervised fine‑tuning, reward‑model training, and PPO reinforcement learning (RLHF) are combined to improve alignment while highlighting the method’s limitations.

AI alignmentChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Really Work? Inside the RLHF Training Process

IT Architects Alliance

Feb 23, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to use Reinforcement Learning from Human Feedback (RLHF) with a PPO algorithm and a sentiment‑analysis model to train a language model that generates positive product reviews, covering task definition, data sampling, reward evaluation, model optimization, and experimental results.

GPTLanguage ModelPPO

0 likes · 11 min read

Training a Positive Review Generator with RLHF and PPO

DataFunTalk

Feb 20, 2023 · Artificial Intelligence

ChatGPT Technology, Localization Efforts, and Open‑Source Large Models – Overview and Practices

This article presents an overview of ChatGPT technology, its evolution, current challenges, a three‑stage learning process, data organization and evaluation, details of domestic localization efforts, practical solutions, and the release of a Chinese open‑source large model with training guidance.

ChatGPTLarge Language ModelModel Localization

0 likes · 12 min read

ChatGPT Technology, Localization Efforts, and Open‑Source Large Models – Overview and Practices

Architect

Feb 19, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to apply Reinforcement Learning from Human Feedback (RLHF) using a sentiment‑analysis model as a reward function and Proximal Policy Optimization (PPO) to fine‑tune a language model that generates positive product reviews, complete with code snippets and experimental results.

Language ModelPPORLHF

0 likes · 10 min read

dbaplus Community

Feb 18, 2023 · Artificial Intelligence

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 but uses supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF) to improve alignment, yet its training methods still cause consistency issues such as invalid help, hallucinations, bias, and limited explainability.

ChatGPTPPORLHF

0 likes · 17 min read

Why ChatGPT Still Gets It Wrong: Inside RLHF and Model Consistency

Open Source Linux

Feb 13, 2023 · Artificial Intelligence

How Does ChatGPT Work? Inside RLHF and Model Consistency

This article explains the inner workings of ChatGPT, detailing its evolution from GPT‑3, the role of reinforcement learning from human feedback (RLHF) in improving consistency, the training pipeline steps, and the limitations and evaluation methods of large language models.

AIChatGPTRLHF

0 likes · 15 min read

How Does ChatGPT Work? Inside RLHF and Model Consistency

Kuaishou Tech

Feb 10, 2023 · Artificial Intelligence

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

On January 25, Kuaishou’s community science team announced that seven of its papers were accepted at the ACM Web Conference 2023 (WWW’23), covering reinforcement‑learning‑based user retention, constrained actor‑critic recommendation, divide‑and‑conquer embedding retrieval, causal embedding with contrastive learning, latent action space exploration, dual‑interest factorization attention, and multi‑task reinforcement learning for recommendation.

AIKuaishouWWW 2023

0 likes · 17 min read

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

Laravel Tech Community

Feb 9, 2023 · Artificial Intelligence

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

This article explains how ChatGPT builds on GPT‑3, describes the supervised‑plus‑reinforcement learning (RLHF) pipeline that fine‑tunes the model, compares model capability with consistency, and discusses the performance evaluation and remaining limitations of large language models.

ChatGPTModel TrainingRLHF

0 likes · 15 min read

Understanding ChatGPT: Architecture, Training Strategies, and Alignment Challenges

Top Architect

Feb 9, 2023 · Artificial Intelligence

How ChatGPT Works: Training, RLHF, and Consistency Issues

ChatGPT, OpenAI’s latest language model, builds on GPT‑3 and improves performance through supervised fine‑tuning, human‑feedback reinforcement learning (RLHF), and PPO optimization, addressing consistency challenges such as misaligned outputs, bias, and hallucinations while evaluating helpfulness, truthfulness, and harmlessness.

ChatGPTRLHFlarge language models

0 likes · 15 min read

How ChatGPT Works: Training, RLHF, and Consistency Issues

DataFunSummit

Feb 8, 2023 · Artificial Intelligence

Technical Architecture and Training Process of ChatGPT

ChatGPT, a dialogue-focused language model, builds on the GPT family and employs techniques such as Reinforcement Learning from Human Feedback (RLHF), the TAMER framework, and a three-stage training pipeline (supervised fine‑tuning, reward modeling, and PPO reinforcement learning) to achieve advanced conversational capabilities.

ChatGPTGPTLanguage Model

0 likes · 7 min read

Technical Architecture and Training Process of ChatGPT

Architects' Tech Alliance

Feb 7, 2023 · Artificial Intelligence

ChatGPT: Technical Principles, Architecture, and the Role of Human‑Feedback Reinforcement Learning

This article explains how ChatGPT builds on GPT‑3 with improved accuracy and coherence, details its training pipeline that combines supervised fine‑tuning and Reinforcement Learning from Human Feedback (RLHF), discusses consistency challenges, evaluation metrics, and the limitations of the RLHF approach.

AI alignmentChatGPTPPO

0 likes · 15 min read

ChatGPT: Technical Principles, Architecture, and the Role of Human‑Feedback Reinforcement Learning

Model Perspective

Jan 12, 2023 · Artificial Intelligence

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

This article introduces neural networks, covering their layered structure, common types like CNNs and RNNs, key components such as activation functions, loss, learning rate, backpropagation, dropout, batch normalization, and extends to reinforcement learning concepts including MDPs, policies, value functions, and Q‑learning.

CNNRNNmachine learning

0 likes · 6 min read

Neural Networks Explained: Architecture, Training, and Reinforcement Basics

DataFunTalk

Dec 30, 2022 · Artificial Intelligence

Graph Representation Learning for Drug Package Recommendation: Discriminative and Generative Approaches

This article reviews the challenges of drug package recommendation in smart healthcare and presents two graph‑based solutions—a discriminative model (DPR) that scores existing drug packages and a generative model (DPG) that creates personalized packages—demonstrating superior performance through extensive experiments and analysis.

AI in healthcareGraph Neural Networksdrug recommendation

0 likes · 19 min read

Graph Representation Learning for Drug Package Recommendation: Discriminative and Generative Approaches

Alimama Tech

Dec 28, 2022 · Artificial Intelligence

Sustainable Online Reinforcement Learning for Auto-bidding (SORL)

The Sustainable Online Reinforcement Learning (SORL) framework tackles offline inconsistency in auto‑bidding by iteratively gathering safe online data from real ad systems with a Lipschitz‑based exploration method and training a variance‑suppressed conservative Q‑learning policy, achieving safer, more stable, and higher‑performing bids on Alibaba’s platform.

Online AdvertisingSafe Explorationauto-bidding

0 likes · 18 min read

Sustainable Online Reinforcement Learning for Auto-bidding (SORL)

Architecture Digest

Dec 15, 2022 · Artificial Intelligence

Technical Overview of ChatGPT: Training Pipeline, RLHF, and Its Potential to Replace Search Engines

This article explains ChatGPT's underlying technology—including its three‑stage training pipeline with supervised fine‑tuning, reward‑model learning, and reinforcement learning from human feedback—while analyzing whether the model can realistically replace traditional search engines such as Google or Baidu.

AIChatGPTLarge Language Model

0 likes · 15 min read

Technical Overview of ChatGPT: Training Pipeline, RLHF, and Its Potential to Replace Search Engines

IT Architects Alliance

Dec 13, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

The article explains ChatGPT’s underlying technology, detailing its three-stage training pipeline—supervised fine‑tuning, reward‑model learning, and reinforcement learning with PPO—while discussing its strengths, limitations, and potential integration with traditional search engines.

AIChatGPTLLM

0 likes · 14 min read

Technical Principles and Training Process of ChatGPT

Tencent Cloud Developer

Dec 9, 2022 · Artificial Intelligence

An Overview of ChatGPT: Technology, Training Process, and Applications

The article outlines ChatGPT’s conversational capabilities, its InstructGPT‑based architecture, a three‑stage RLHF training pipeline involving supervised fine‑tuning, human‑ranked response generation, and PPO optimization, and discusses its strengths, limitations, diverse applications, and future directions for multimodal, up‑to‑date assistants.

AI ApplicationsChatGPTLarge Language Model

0 likes · 18 min read

An Overview of ChatGPT: Technology, Training Process, and Applications