Tagged articles

reinforcement learning

743 articles · Page 7 of 8

Dec 9, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

The article explains how ChatGPT builds on the GPT‑3.5 large language model, using human‑annotated data and Reinforcement Learning from Human Feedback (RLHF) across three training stages to improve instruction understanding, answer quality, and continual model enhancement, while also discussing its potential to complement or replace traditional search engines.

AIChatGPTInstruction Tuning

0 likes · 15 min read

Technical Principles and Training Process of ChatGPT

IT Architects Alliance

Dec 8, 2022 · Artificial Intelligence

Technical Principles and Training Process of ChatGPT

This article explains the technical foundations of ChatGPT, detailing its three-stage training pipeline—supervised fine‑tuning with human‑annotated data, reward model training via pairwise ranking, and reinforcement learning from human feedback—while also discussing its limitations compared to traditional search engines and potential future enhancements.

AIChatGPTLarge Language Model

0 likes · 14 min read

vivo Internet Technology

Dec 7, 2022 · Artificial Intelligence

Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

Vivo tackles the complex problem of mixing heterogeneous content queues—ads, games, and organic items—in its information‑flow and app‑store by evolving from rule‑based weighting to Q‑learning and deep‑learning position models that respect product constraints, preserve ordering, and balance short‑term revenue with long‑term user experience, while planning deeper personalization and on‑device solutions.

AdvertisingApp StoreInformation Flow

0 likes · 14 min read

Mixing Heterogeneous Queues in Vivo's Information Flow and App Store: Challenges, Practices, and RL/Deep Learning Solutions

Top Architect

Dec 7, 2022 · Artificial Intelligence

Technical Principles of ChatGPT and Its Prospects for Replacing Traditional Search Engines

The article explains how ChatGPT builds on GPT‑3.5 with supervised fine‑tuning, reward‑model training and reinforcement learning from human feedback, analyzes why it cannot yet replace search engines due to hallucinations, knowledge freshness and cost, and proposes a hybrid architecture that combines LLM generation with traditional retrieval to overcome these limitations.

AIChatGPTLarge Language Model

0 likes · 16 min read

Technical Principles of ChatGPT and Its Prospects for Replacing Traditional Search Engines

HomeTech

Nov 16, 2022 · Artificial Intelligence

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

This article introduces the basic concepts of reinforcement learning, derives model‑based and model‑free policy gradient methods—including vanilla policy gradient and Actor‑Critic—explains their mathematical foundations, and demonstrates their use in scene text recognition and image captioning tasks.

AIAttention Mechanismactor-critic

0 likes · 22 min read

Fundamentals and Policy Gradient Algorithms in Reinforcement Learning with Applications to Scene Text Recognition

AntTech

Nov 7, 2022 · Blockchain

Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning‑Guided Fuzzing

This paper presents a reinforcement‑learning‑based fuzzer (RLF) that generates transaction sequences likely to trigger smart‑contract vulnerabilities, combining vulnerability‑driven and coverage‑driven rewards to improve detection efficiency and outperform existing state‑of‑the‑art tools.

RL-based fuzzerreinforcement learning

0 likes · 12 min read

Effectively Generating Vulnerable Transaction Sequences in Smart Contracts with Reinforcement Learning‑Guided Fuzzing

NetEase LeiHuo Testing Center

Nov 4, 2022 · Artificial Intelligence

Applying AI for Game Balance Testing: DNN Victory Prediction and Genetic Algorithm Optimization

This article details a practical AI-driven workflow for a turn‑based card game, covering problem background, data modeling with a DNN victory‑prediction network, reinforcement‑learning‑based data generation, and a genetic‑algorithm search to identify the strongest and weakest team compositions.

AIDNNGame Balance

0 likes · 18 min read

Applying AI for Game Balance Testing: DNN Victory Prediction and Genetic Algorithm Optimization

DataFunTalk

Nov 4, 2022 · Artificial Intelligence

Explainable Knowledge Graph Reasoning: Background, Advances, Motivation, Recent Research, and Outlook

This article reviews explainable knowledge graph reasoning, covering its background, core concepts, downstream applications, major reasoning methods, motivations for interpretability, recent advances such as hierarchical and Bayesian reinforcement learning, meta‑path mining, and future research directions.

Knowledge Graphexplainable AIgraph reasoning

0 likes · 18 min read

Explainable Knowledge Graph Reasoning: Background, Advances, Motivation, Recent Research, and Outlook

Youku Technology

Oct 28, 2022 · Artificial Intelligence

Enlarging Long‑time Dependencies via Reinforcement‑Learning‑Based Memory Network for Movie Affective Analysis

The authors introduce a reinforcement‑learning‑driven memory network that augments long‑range dependencies for continuous valence‑arousal emotion prediction in movies, integrating five multimodal features and a DDPG‑based update policy, which yields state‑of‑the‑art performance across multiple affective‑analysis and summarization benchmarks.

VA affect modellong‑term dependenciesmemory network

0 likes · 16 min read

Enlarging Long‑time Dependencies via Reinforcement‑Learning‑Based Memory Network for Movie Affective Analysis

Model Perspective

Oct 26, 2022 · Artificial Intelligence

Master Machine Learning Algorithms: Types, Python Code & Real-World Examples

This article categorizes machine learning algorithms into supervised, unsupervised, and reinforcement learning, then details ten common algorithms—including linear regression, logistic regression, decision trees, SVM, Naive Bayes, K‑NN, K‑means, random forest, and dimensionality reduction—accompanied by clear Python code examples and illustrative diagrams.

Pythonalgorithmsmachine learning

0 likes · 14 min read

Master Machine Learning Algorithms: Types, Python Code & Real-World Examples

Sohu Tech Products

Oct 12, 2022 · Artificial Intelligence

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

DeepMind’s AlphaTensor, built on AlphaZero and reinforcement learning, automatically discovers novel, provably correct matrix multiplication algorithms that outperform classic methods like Strassen’s, demonstrating how modern AI can automate algorithm discovery and significantly accelerate computations across many fields.

AIAlphaTensorDeepMind

0 likes · 8 min read

AlphaTensor: DeepMind’s AI System for Discovering Faster Matrix Multiplication Algorithms

Alimama Tech

Sep 21, 2022 · Artificial Intelligence

Alibaba's Three Papers Accepted at NeurIPS 2022

Alibaba’s research team secured three NeurIPS 2022 papers—introducing an Adaptive Parameter Generation network that boosts click‑through rates and revenue, a tuning‑free Global Batch Gradient Aggregation method that speeds recommendation model training by 2.4×, and a Sustainable Online Reinforcement Learning framework that outperforms existing auto‑bidding strategies.

NeurIPSOnline AdvertisingRecommendation Systems

0 likes · 6 min read

Alibaba's Three Papers Accepted at NeurIPS 2022

GuanYuan Data Tech Team

Sep 8, 2022 · Artificial Intelligence

How AI Reinforcement Learning Transforms Smart Replenishment in Retail

This article examines the technical challenges of intelligent replenishment—model stability, complexity, generalization, and interpretability—and explains how a few‑shot imitation learning and inverse reinforcement learning framework can overcome these issues to deliver reliable, low‑cost AI‑driven supply‑chain decisions.

AIimitation learningmodel stability

0 likes · 22 min read

How AI Reinforcement Learning Transforms Smart Replenishment in Retail

Alimama Tech

Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLMDPROI constraint

0 likes · 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

DataFunTalk

Sep 2, 2022 · Artificial Intelligence

Applying Reinforcement Learning to E‑commerce Traffic Control: Practices and Future Directions

This talk by JD Retail's Zhao Yu explains how reinforcement learning is modeled and deployed for large‑scale traffic control during major sales events, detailing system architecture, reward design, offline simulation, model upgrades, and future research directions.

JD.comOnline AdvertisingRL modeling

0 likes · 20 min read

Applying Reinforcement Learning to E‑commerce Traffic Control: Practices and Future Directions

Bilibili Tech

Aug 30, 2022 · Artificial Intelligence

Neural MMO Massive AI Team Survival Challenge: Advances in Multi‑Agent Decision AI

The IJCAI‑2022 Neural MMO Massive AI Team Survival Challenge demonstrated that deep reinforcement‑learning agents can achieve sophisticated cooperation and competition among 128 agents in a large‑scale MMO‑style world, highlighting the growing focus on decision‑AI, the effectiveness of self‑play and CTDE, and the platform’s potential for future research into population‑level behavior, economics, and complex real‑world decision making.

AI competitionDecision AIMassive AI

0 likes · 11 min read

Neural MMO Massive AI Team Survival Challenge: Advances in Multi‑Agent Decision AI

Bilibili Tech

Aug 30, 2022 · Artificial Intelligence

Reinforcement Learning in Neural MMO: Background, Environment, Competition Solution, and Insights

The article reviews reinforcement learning applied to Neural MMO—a large‑scale, multi‑agent MMO environment—detailing its competitive IJCAI 2022 track, the winning LastOrder solution with transformer‑CNN‑LSTM architecture, reward shaping, a Fictitious Self‑Play meta‑solver, and Bilibili’s scalable Newton training framework.

AI in GamesMeta SolverMulti-Agent Systems

0 likes · 9 min read

Reinforcement Learning in Neural MMO: Background, Environment, Competition Solution, and Insights

Laiye Technology Team

Aug 29, 2022 · Artificial Intelligence

Evolution of Dialogue Management: From Rule‑Based to Data‑Driven Systems and Industrial Deployments

This article reviews the historical development of dialogue management—from early rule‑based and finite‑state approaches to modern data‑driven and reinforcement‑learning methods—and examines how major industry platforms such as Amazon Alexa, Amazon Lex, and RASA implement these techniques in practice.

Amazon AlexaData-DrivenNLU

0 likes · 16 min read

Evolution of Dialogue Management: From Rule‑Based to Data‑Driven Systems and Industrial Deployments

IEG Growth Platform Technology Team

Aug 16, 2022 · Artificial Intelligence

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

The paper proposes an actor‑critic reinforcement‑learning model (ACRL) that leverages PPO and a deep structured semantic model to optimize real‑time bidding strategies for mobile game ads under CPM and budget constraints, addressing long user lifecycles and sparse conversion data while demonstrably improving ROI in both offline simulations and online A/B tests.

Mobile AdvertisingOnline AdvertisingROI

0 likes · 16 min read

Actor‑Critic Reinforcement Learning for Real‑Time Bidding in Mobile Game Advertising

IEG Growth Platform Technology Team

Aug 10, 2022 · Artificial Intelligence

Two Tencent IEG Papers Accepted at CIKM: Actor‑Critic Reinforcement Learning for Optimal Bidding and Adversarial Adaptation for Cross‑Domain Recommendation

Tencent's IEG Growth Middle Platform team announced that two of its research papers—one presenting an actor‑critic reinforcement learning model for real‑time bidding in online display advertising and the other proposing an adversarial adaptation framework for cross‑domain recommendation—were accepted at the top‑tier CIKM conference, highlighting novel algorithms that achieve state‑of‑the‑art performance and have been deployed to serve billions of daily impressions.

Advertisingadversarial adaptationcross-domain recommendation

0 likes · 4 min read

Two Tencent IEG Papers Accepted at CIKM: Actor‑Critic Reinforcement Learning for Optimal Bidding and Adversarial Adaptation for Cross‑Domain Recommendation

Model Perspective

Aug 5, 2022 · Artificial Intelligence

What Are the Essential Steps and Types of Machine Learning?

Machine learning involves five core steps—from data collection and preparation to model training, evaluation, and improvement—while encompassing supervised, unsupervised, and reinforcement learning methods, each with distinct algorithms and real-world applications across finance, healthcare, and retail.

Applicationsmachine learningreinforcement learning

0 likes · 7 min read

What Are the Essential Steps and Types of Machine Learning?

NetEase LeiHuo Testing Center

Jul 29, 2022 · Artificial Intelligence

AI‑Powered Compatibility Testing for Mobile Games: Platform Design, Scene Traversal, and Anomaly Detection

This article describes an AI‑driven mobile game compatibility testing framework that combines a cloud device farm, a Poco‑based scene‑traversal module with reinforcement‑learning click strategies, and a computer‑vision anomaly detection model enhanced by data‑augmentation techniques to identify UI defects across diverse devices and game scenarios.

AIScene Traversalreinforcement learning

0 likes · 14 min read

AI‑Powered Compatibility Testing for Mobile Games: Platform Design, Scene Traversal, and Anomaly Detection

GuanYuan Data Tech Team

Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Evolution StrategiesMDPQ-Learning

0 likes · 27 min read

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

Youku Technology

Jul 5, 2022 · Artificial Intelligence

Enlarging the Long-time Dependencies via RL-based Memory Network in Movie Affective Analysis

The paper introduces a reinforcement‑learning‑driven memory network that stores and updates historical video information via DDPG, overcoming LSTM/Transformer limitations on long‑duration movie sequences, and achieves state‑of‑the‑art affective prediction on LIRIS‑ACCEDE and related datasets, with real‑world deployments in AI content inspection and film‑element knowledge graphs.

long-term dependenciesmemory networkmovie affective analysis

0 likes · 5 min read

Enlarging the Long-time Dependencies via RL-based Memory Network in Movie Affective Analysis

58 Tech

Jun 24, 2022 · Artificial Intelligence

Reinforcement Learning for Lead Generation in Task‑Oriented Dialogue Systems

This article presents a reinforcement‑learning‑based approach to improve lead‑capture efficiency of a task‑oriented chatbot used in local services, detailing the system architecture, RL algorithms (DQN/DDQN), data construction, model training, offline and online evaluation, and the resulting commercial gains.

DQNLead Generationcustomer service

0 likes · 27 min read

Reinforcement Learning for Lead Generation in Task‑Oriented Dialogue Systems

AntTech

Jun 22, 2022 · Cloud Computing

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

This article presents a cloud-native, end‑to‑end autoscaling solution that integrates traffic forecasting, CPU utilization meta‑prediction, and a reinforcement‑learning‑based scaling decision module into a fully differentiable system, achieving higher resource utilization and cost efficiency as demonstrated by ACM SIGKDD 2022 research.

Cloud ComputingMeta LearningPredictive Modeling

0 likes · 10 min read

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

DataFunSummit

Jun 21, 2022 · Artificial Intelligence

JiuGe: An Automatic Chinese Classical Poetry Generation System – Algorithms and Research Overview

This article presents the JiuGe system developed by THUNLP for automatically generating Chinese classical poetry, detailing its research motivations, model architecture—including salient‑clue, working‑memory, topic‑memory, style‑transfer and reinforcement‑learning components—implementation, applications, and future directions.

Knowledge GraphPoetry Generationartificial-intelligence

0 likes · 18 min read

JiuGe: An Automatic Chinese Classical Poetry Generation System – Algorithms and Research Overview

Huawei Cloud Developer Alliance

Jun 1, 2022 · Artificial Intelligence

How AI Beats Super Mario with PPO in 5 Minutes

This tutorial demonstrates how to use Huawei Cloud ModelArts and the Proximal Policy Optimization (PPO) reinforcement‑learning algorithm to train an AI agent that can clear most Super Mario levels within about 1500 episodes, even for users with no coding experience.

AIModelArtsPPO

0 likes · 6 min read

How AI Beats Super Mario with PPO in 5 Minutes

DataFunSummit

May 16, 2022 · Artificial Intelligence

Reinforcement Learning for E‑commerce Search Ranking: RNN User State Modeling and DDPG Long‑Term Value Optimization

This presentation details how JD applied reinforcement learning—using RNN‑based user state modeling and a DDPG framework—to improve e‑commerce search ranking by optimizing long‑term cumulative value, showing significant offline and online gains in conversion and GMV.

DDPGRNNUser Modeling

0 likes · 20 min read

Reinforcement Learning for E‑commerce Search Ranking: RNN User State Modeling and DDPG Long‑Term Value Optimization

Meituan Technology Team

Apr 28, 2022 · Artificial Intelligence

Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

This article analyzes Meituan's delivery advertising system, detailing the shift from linear programming to an evolutionary‑strategy‑based multi‑action allocation (ES‑MACA), describing problem formalization, offline training, reward evaluation, online decision flow, extensive offline and online experiments, and future directions toward reinforcement learning.

AdvertisingMeituanOnline Advertising

0 likes · 28 min read

Multi-Action Computation Allocation via Evolutionary Strategies in Meituan Takeaway Advertising

Code DAO

Apr 28, 2022 · Artificial Intelligence

Model-Based Reinforcement Learning from Raw Video: A Detailed Walkthrough

The article explains how to train robots to learn tasks directly from raw video using model-based reinforcement learning, covering POMDP formulation, CNN auto‑encoders, latent‑space representations, iLQR optimization, and a step‑by‑step pipeline with concrete examples and references.

CNN autoencoderPOMDPiLQR

0 likes · 11 min read

Model-Based Reinforcement Learning from Raw Video: A Detailed Walkthrough

Code DAO

Apr 24, 2022 · Artificial Intelligence

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

The article explains how transfer learning reduces data and time requirements in deep learning by reusing pretrained models for vision, natural language processing, and reinforcement learning, while discussing challenges such as overfitting, the need for progressive networks, entropy regularization, domain adaptation, multi‑task learning, and model distillation.

Domain AdaptationMulti-Task Learningdeep learning

0 likes · 10 min read

How Transfer Learning Accelerates Deep Learning Across Vision, NLP, and Reinforcement Learning

DaTaobao Tech

Apr 13, 2022 · Artificial Intelligence

Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

Alibaba’s Taobao Live team replaced rule‑based bandwidth estimators with three machine‑learning solutions—Concerto, OnRL, and Loki—trained on over a million hours of global live‑stream data, achieving up to 13% throughput gain, threefold stall reduction, and up to 44% lower 95th‑percentile stalls, now deployed commercially.

adaptive bitratebandwidth predictionmachine learning

0 likes · 14 min read

Machine‑Learning Based Bandwidth Prediction and Adaptive Streaming for Taobao Live: Concerto, OnRL, and Loki

Python Programming Learning Circle

Apr 6, 2022 · Artificial Intelligence

Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

This tutorial explains how to install the gym and highway‑env packages, configure the simulation environment, process state and action representations, implement a DQN network in PyTorch, and train the model while visualizing performance metrics for autonomous driving tasks.

DQNPythonSimulation

0 likes · 11 min read

Building a DQN‑based Autonomous Driving Agent with highway‑env in Python

Alimama Tech

Mar 16, 2022 · Artificial Intelligence

Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

Deep GSP is a multi‑objective, deep‑learning ad auction that jointly learns rank scores while enforcing game‑theoretic constraints—monotonicity, incentive compatibility, and Nash equilibrium—and a smooth‑transition penalty, using DDPG reinforcement learning to outperform traditional GSP across revenue, clicks, conversions, and add‑to‑cart metrics.

advertising auctionmechanism designmulti-objective optimization

0 likes · 18 min read

Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

DataFunSummit

Mar 12, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details Kuaishou's short‑video recommendation pipeline, explaining the challenges of large‑scale sequencing, the development of sequence re‑ranking, multi‑content mixing, on‑device re‑ranking, and reinforcement‑learning‑based strategies, and demonstrates how these innovations improve user engagement and business metrics.

KuaishouRecommendation Systemsmulti-content mixing

0 likes · 15 min read

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

DataFunSummit

Mar 3, 2022 · Artificial Intelligence

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering technology selection for recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and a session‑level auction mechanism that together improve monetization efficiency and long‑term user value.

CTRauctionreinforcement learning

0 likes · 18 min read

Sequence Optimization, Context-Aware CTR Re-Estimation, and Session-Level Auction for JD Advertising Ranking

DataFunTalk

Feb 24, 2022 · Artificial Intelligence

Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking

The article presents JD's technical evolution for advertising ranking, covering recommendation ad sorting, context‑aware CTR re‑estimation, reinforcement‑learning‑based sequence optimization, and session‑level auction mechanisms, and includes a Q&A that highlights practical gains and implementation challenges.

AdvertisingCTR PredictionContext-Aware

0 likes · 14 min read

Sequence Optimization and Context-Aware CTR Re-Estimation for JD Advertising Ranking

DataFunTalk

Feb 20, 2022 · Artificial Intelligence

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

This article presents DRL-Rec, a distilled reinforcement learning framework for recommendation that integrates an exploring‑filtering module and confidence‑guided distillation to compress RL‑based recommenders while improving accuracy, and reports significant offline and online performance gains on a large‑scale system.

knowledge distillationonline experimentsreinforcement learning

0 likes · 16 min read

Distilled Reinforcement Learning Framework for Recommendation (DRL-Rec): Design, Modules, and Experimental Evaluation

DataFunTalk

Feb 10, 2022 · Artificial Intelligence

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details the technical evolution of Kuaishou's short‑video recommendation pipeline, focusing on sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking, and explains how transformer‑based models, generator‑evaluator frameworks, and reinforcement‑learning strategies are employed to maximize overall sequence value, user engagement, and revenue.

KuaishouRe‑rankingmulti-content mixing

0 likes · 15 min read

IEG Growth Platform Technology Team

Jan 10, 2022 · Artificial Intelligence

Applying Reinforcement Learning to Optimize Advertising Bidding ROI

This article presents a comprehensive overview of using reinforcement learning to solve advertising bidding ROI optimization, covering historical foundations, methodological reasoning, system architecture, practical implementation details, challenges, evaluation metrics, and recommended algorithms for real‑world ad placement scenarios.

AdvertisingOnline AdvertisingROI optimization

0 likes · 17 min read

Applying Reinforcement Learning to Optimize Advertising Bidding ROI

DataFunTalk

Jan 3, 2022 · Artificial Intelligence

Intelligent Advertising Delivery System: Budget‑Constrained Bidding, Multi‑Constraint Bidding, Sequential Allocation, and Multi‑Channel Optimization

This article systematically introduces Alibaba's advertising intelligence platform, covering the evolution from simple CPM/CPC models to advanced budget‑constrained, multi‑constraint, and sequential bidding strategies, multi‑channel optimization, and reinforcement‑learning‑based solutions that jointly maximize advertiser ROI and platform revenue.

Multi-Channelbudget optimizationmachine learning

0 likes · 34 min read

Intelligent Advertising Delivery System: Budget‑Constrained Bidding, Multi‑Constraint Bidding, Sequential Allocation, and Multi‑Channel Optimization

58 Tech

Dec 28, 2021 · Artificial Intelligence

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

This talk explains how 58.com tackles the cold‑start and interest‑divergence problems of its massive blue‑collar job recruitment platform by modeling the recommendation process as a reinforcement‑learning task, detailing the use of multi‑armed bandit, contextual bandit, and linear‑UCB algorithms, offline evaluation pipelines, online deployment, and observed performance gains.

Contextual Banditcold-startjob recommendation

0 likes · 25 min read

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

DataFunTalk

Dec 17, 2021 · Artificial Intelligence

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

This talk explains how 58.com’s massive blue‑collar recruitment platform uses reinforcement‑learning techniques—including multi‑armed bandits, contextual MAB, and linear UCB—to address cold‑start and interest‑divergence challenges, describes the system architecture, offline evaluation, online deployment, and reports an 8% uplift in new‑user conversion.

cold-startcontextual MABjob recruitment

0 likes · 26 min read

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

Code DAO

Dec 14, 2021 · Artificial Intelligence

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

This article walks through constructing a learnable chess AI by integrating AlphaZero‑style Monte Carlo Tree Search with a decoder‑only Transformer, detailing the game tree logic, model architecture, input and output encodings, self‑play training loop, and code implementation in PyTorch.

AlphaZeroMonteCarloTreeSearchPyTorch

0 likes · 23 min read

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

IEG Growth Platform Technology Team

Dec 6, 2021 · Artificial Intelligence

Model-Free Reinforcement Learning for ROI Optimization: Methods, Advertising Applications, and Tencent Game Advertising Practice

This article introduces model‑free reinforcement learning fundamentals, reviews mainstream solution methods such as Monte‑Carlo, Temporal‑Difference, and n‑step TD with eligibility traces, discusses their application in online advertising and presents Tencent's game advertising practice, including algorithm choices, reward design, and experimental results.

A3CAdvertisingPPO

0 likes · 17 min read

Model-Free Reinforcement Learning for ROI Optimization: Methods, Advertising Applications, and Tencent Game Advertising Practice

Code DAO

Dec 3, 2021 · Artificial Intelligence

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL

This article derives the policy‑gradient objective for discrete actions, implements the Monte‑Carlo REINFORCE algorithm in PyTorch, explains the actor‑critic framework, introduces Advantage Actor‑Critic (A2C) versus A3C, and demonstrates their performance on the OpenAI Gym CartPole‑v0 environment.

A2COpenAI GymPython

0 likes · 13 min read

Understanding Actor‑Critic and A2C: From Policy Gradients to REINFORCE in RL

Code DAO

Nov 28, 2021 · Artificial Intelligence

Adapting Soft Actor‑Critic for Discrete Action Spaces in Deep Reinforcement Learning

This article explains how to modify the Soft Actor‑Critic (SAC) algorithm—originally designed for continuous actions—to work with discrete action environments, presents the required changes to the actor and critic loss functions, provides a full PyTorch implementation, and evaluates the method on the CartPole‑v1 benchmark.

CartPoleDiscrete ActionsEntropy Regularization

0 likes · 20 min read

Adapting Soft Actor‑Critic for Discrete Action Spaces in Deep Reinforcement Learning

ByteDance Terminal Technology

Oct 26, 2021 · Mobile Development

Fastbot: Cross‑Platform Intelligent Automated Testing System for Android and iOS

This article details ByteDance’s Fastbot system, an AI‑driven cross‑platform automated testing framework for Android and iOS that leverages model‑based testing, reinforcement learning, and image‑based UI analysis to improve test coverage, fault injection, and scalability across mobile applications and games.

AICross-Platformmobile testing

0 likes · 36 min read

Fastbot: Cross‑Platform Intelligent Automated Testing System for Android and iOS

Alimama Tech

Sep 29, 2021 · Artificial Intelligence

Unified Solution to Constrained Bidding in Online Display Advertising (USCB)

The paper proposes a unified solution for real‑time bidding in online display ads that formulates advertiser budget and KPI limits as a constrained linear program, derives a closed‑form optimal bidding function with m+1 parameters, and uses model‑free reinforcement learning to dynamically adjust those parameters, achieving superior traffic‑value capture in large‑scale deployment on Alibaba’s Taobao platform.

constrained optimizationparameter tuningreal-time bidding

0 likes · 11 min read

Unified Solution to Constrained Bidding in Online Display Advertising (USCB)

Python Programming Learning Circle

Sep 27, 2021 · Artificial Intelligence

Training Reinforcement Learning Agents on Street Fighter III Using a MAME Wrapper Python Library

This tutorial explains how to install and use a Python library that wraps the MAME emulator to train reinforcement‑learning agents on arcade games such as Street Fighter III, covering system requirements, installation, environment configuration, debugging, step‑wise simulation, and a simple ConvNet agent example.

AIMAMEPython

0 likes · 4 min read

Training Reinforcement Learning Agents on Street Fighter III Using a MAME Wrapper Python Library

ByteFE

Aug 2, 2021 · Artificial Intelligence

An Overview of Artificial Intelligence, Machine Learning, and Neural Networks

This article provides a beginner‑friendly overview of artificial intelligence, its relationship with machine learning, the four major learning paradigms—supervised, unsupervised, semi‑supervised and reinforcement learning—along with a historical sketch of neural networks, their training workflow, loss functions, back‑propagation, and parameter‑update mechanisms, while also containing a brief recruitment notice.

artificial-intelligencedeep learningmachine learning

0 likes · 18 min read

An Overview of Artificial Intelligence, Machine Learning, and Neural Networks

DataFunSummit

Aug 1, 2021 · Artificial Intelligence

A Comprehensive Overview of Sequence Recommendation Models and Techniques

This article provides an in‑depth review of user behavior sequence recommendation, covering problem definition, data preparation, and a range of neural models—including MLP, CNN, RNN, Temporal CNN, self‑attention, and reinforcement learning—along with practical implementation tips and references.

MLneural networksreinforcement learning

0 likes · 35 min read

A Comprehensive Overview of Sequence Recommendation Models and Techniques

DataFunSummit

Jul 25, 2021 · Artificial Intelligence

Advances in Query Understanding and Semantic Retrieval at Zhihu Search

This article details Zhihu Search's engineering solutions for long‑tail query challenges, covering historical development, term weighting, synonym expansion, query rewriting with reinforcement learning, and semantic recall using BERT‑based models, while also outlining future research directions such as GAN‑based rewriting and lightweight pre‑training.

BERTEmbedding RetrievalQuery Rewriting

0 likes · 14 min read

Advances in Query Understanding and Semantic Retrieval at Zhihu Search

Java Architect Essentials

Jul 21, 2021 · Artificial Intelligence

DouZero: A Simple Monte‑Carlo Based AI that Achieves State‑of‑the‑Art Performance in Dou Dizhu

DouZero, a reinforcement‑learning AI for the Chinese card game Dou Dizhu, combines a Monte‑Carlo value‑network with compact action encoding, trains on a four‑GPU server, and outperforms existing AI baselines, ranks first on Botzone, and even surpasses human play in several metrics.

AIDouZeroMonte Carlo

0 likes · 15 min read

DouZero: A Simple Monte‑Carlo Based AI that Achieves State‑of‑the‑Art Performance in Dou Dizhu

DataFunTalk

Jun 15, 2021 · Artificial Intelligence

Personalized Approximate Pareto-Efficient Recommendation (PAPERec): A Multi‑Objective Reinforcement Learning Framework for User‑Level Objective Personalization

The paper introduces PAPERec, a personalized multi‑objective recommendation framework that leverages Pareto‑oriented reinforcement learning to generate user‑specific objective weights, enabling the model to approximate Pareto‑optimal solutions and achieve superior click‑through rate and dwell‑time performance in both offline and online experiments.

CTRPareto efficiencyRecommendation Systems

0 likes · 12 min read

Personalized Approximate Pareto-Efficient Recommendation (PAPERec): A Multi‑Objective Reinforcement Learning Framework for User‑Level Objective Personalization

Alimama Tech

Jun 10, 2021 · Artificial Intelligence

Overview of Recent Alibaba Mama Research Papers Presented at KDD 2021 on Advertising and AI

At KDD 2021, Alibaba Mama presented six papers that introduced a unified constrained‑bidding solution, a deep‑learnable auction mechanism, real‑negative training for delayed‑feedback CVR, a contextual‑bandit advertising strategy recommender, a multi‑agent cooperative bidding game, and an uncertainty‑aware exploration model, all of which have been deployed to boost platform revenue and advertiser performance.

AlibabaAuction MechanismsKDD

0 likes · 16 min read

Overview of Recent Alibaba Mama Research Papers Presented at KDD 2021 on Advertising and AI

Laiye Technology Team

Jun 8, 2021 · Artificial Intelligence

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue Systems

This paper presents a hierarchical reinforcement learning approach that jointly trains dialogue policy and natural language generation modules for task-oriented dialogue systems, achieving state‑of‑the‑art performance on MultiWOZ 2.0 and 2.1 while preserving response fluency.

MultiWOZNatural Language Generationdialogue policy

0 likes · 10 min read

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue Systems

DataFunTalk

Apr 24, 2021 · Artificial Intelligence

Intelligent Advertising Delivery System and Techniques: From Budget‑Constrained Bidding to Multi‑Channel Optimization

This article systematically introduces Alibaba's advertising intelligence platform, covering the evolution from basic CPM/CPC models to advanced OCPC/OCPM, budget‑constrained bidding, multi‑constraint bidding, sequence‑based long‑term value bidding, multi‑channel allocation, and the AI‑driven Smart Bidding product, highlighting algorithmic foundations, practical implementations, and performance gains.

AdvertisingMulti-Channelbidding

0 likes · 32 min read

Intelligent Advertising Delivery System and Techniques: From Budget‑Constrained Bidding to Multi‑Channel Optimization

DataFunSummit

Mar 25, 2021 · Artificial Intelligence

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects

Reinforcement learning, a branch of artificial intelligence, is explained through its core concepts, successful case studies such as AlphaGo and AlphaStar, practical application workflows, current challenges, resources, and future outlook, offering a comprehensive guide for researchers and practitioners.

ApplicationsPolicy Optimizationartificial-intelligence

0 likes · 56 min read

An Overview of Reinforcement Learning: Concepts, Applications, Challenges, and Future Prospects

DataFunTalk

Mar 9, 2021 · Artificial Intelligence

Introduction to Common Machine Learning Algorithms with Python Implementations

This article introduces the three main categories of machine learning—supervised, unsupervised, and reinforcement learning—detailing common algorithms such as Linear Regression, Logistic Regression, Naive Bayes, K‑Nearest Neighbors, Decision Trees, Random Forests, SVM, K‑Means, and PCA, and provides concise Python code examples using scikit‑learn for each.

PythonScikit-learnmachine learning

0 likes · 18 min read

Introduction to Common Machine Learning Algorithms with Python Implementations

DataFunTalk

Feb 24, 2021 · Artificial Intelligence

Multi‑Objective Ranking in Kuaishou Short‑Video Recommendation: System Design and Online Results

This article details Kuaishou's multi‑objective ranking pipeline for short‑video recommendation, covering manual score fusion, GBDT ensemble, Learn‑to‑Rank, online auto‑tuning, ensemble sorting, reinforcement‑learning rerank, and on‑device rerank, and reports their impact on DAU, watch time and user interaction.

Kuaishoumachine learningmulti-objective ranking

0 likes · 21 min read

Multi‑Objective Ranking in Kuaishou Short‑Video Recommendation: System Design and Online Results

Architects' Tech Alliance

Jan 29, 2021 · Artificial Intelligence

Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies

This article provides a detailed introduction to machine learning, covering its definition, learning modes such as supervised, unsupervised and reinforcement learning, shallow versus deep learning, the full industry chain from AI chips to cloud and big‑data services, and the major open‑source frameworks and platforms driving the field.

AI chipsBig Datamachine learning

0 likes · 11 min read

Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies

Programmer DD

Jan 3, 2021 · Artificial Intelligence

How Self‑Play and GAIL Powered the WeKick AI to Win the First Google Football Kaggle Championship

After a nostalgic gaming session, the author recounts how Tencent’s upgraded AI, WeKick, leveraged self‑play reinforcement learning, GAIL‑based adversarial simulation, and a multi‑style League framework to dominate the inaugural Google Football Kaggle competition, illustrating the escalating complexity of multi‑agent AI in real‑time strategy games.

GAILKaggle competitionMulti-Agent Systems

0 likes · 8 min read

How Self‑Play and GAIL Powered the WeKick AI to Win the First Google Football Kaggle Championship

DataFunTalk

Dec 23, 2020 · Artificial Intelligence

Advances in Knowledge Graph Completion: Methods, Challenges, and Future Directions

This article reviews the rapid progress of knowledge graph completion, covering its background, formal problem definition, major technical approaches—including representation learning, path‑based search, reinforcement learning, logical reasoning, and meta‑learning—while discussing their challenges, recent improvements, and promising future research directions.

CompletionKnowledge GraphLogical Reasoning

0 likes · 14 min read

Advances in Knowledge Graph Completion: Methods, Challenges, and Future Directions

JD Cloud Developers

Dec 21, 2020 · Artificial Intelligence

Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More

This week’s developer newsletter spotlights the Chinese Academy of Sciences’ pioneering GNN accelerator chip, IDC’s ten cloud computing predictions for China, the booming IoT market and 5G dominance, Docker’s M1‑compatible desktop preview, a carbon‑nanotube transistor breakthrough, IBM’s FHE initiative, and recent AI research on lifelong learning and reinforcement learning exploration.

DockerIoTartificial-intelligence

0 likes · 7 min read

Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More

DataFunTalk

Nov 12, 2020 · Artificial Intelligence

Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation

This article explains how reinforcement learning, with its focus on maximizing long‑term reward, can improve recommendation system mixing by covering basic RL concepts, differences from supervised learning, multi‑armed bandit approaches, practical OpenAI Gym experiments, new AUC metrics, online gains, and advanced model optimizations.

OpenAI GymQ-LearningRecommendation Systems

0 likes · 10 min read

Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation

Didi Tech

Oct 10, 2020 · Artificial Intelligence

Deep Reinforcement Learning for Route Planning in DiDi Ride‑Hailing

DiDi’s route engine, handling over 40 billion daily requests, replaces static graph algorithms with a deep‑reinforcement‑learning system that first learns intersection decisions via behavior‑cloning LSTM models and then refines them through self‑play Q‑learning, using beam‑search decoding to produce globally optimal, low‑deviation routes for ride‑hailing.

AIBeam SearchBehavior Cloning

0 likes · 12 min read

Deep Reinforcement Learning for Route Planning in DiDi Ride‑Hailing

DataFunTalk

Oct 4, 2020 · Artificial Intelligence

Reinforcement Learning for Product Ranking: Model Design, Experiments, and Online Deployment

This article presents a comprehensive study of using reinforcement learning to improve e‑commerce product ranking, covering the limitations of traditional scoring, the design of context‑aware models, a pointer‑network based sequence generator, various RL algorithms, extensive offline evaluations, and successful online deployment with future research directions.

PPOdeep learninge-commerce

0 likes · 28 min read

Reinforcement Learning for Product Ranking: Model Design, Experiments, and Online Deployment

Sohu Tech Products

Sep 16, 2020 · Artificial Intelligence

Open-Domain Dialogue Systems: Current State, Challenges, and Future Directions

This article reviews the latest advances in open-domain dialogue systems, covering classification, end‑to‑end generation challenges, knowledge‑controlled generation, automated evaluation, large‑scale latent‑space models such as PLATO, and outlines future research directions for building more coherent and controllable conversational AI.

Dialogue SystemsEvaluationknowledge grounding

0 likes · 14 min read

Open-Domain Dialogue Systems: Current State, Challenges, and Future Directions

MaGe Linux Operations

Sep 9, 2020 · Artificial Intelligence

Master Machine Learning Basics: Concepts, Types, Algorithms & K‑NN Walkthrough

This comprehensive tutorial introduces machine learning fundamentals, its history, differences from traditional programming, key characteristics, and why Python is the preferred language, then explores supervised, unsupervised, and reinforcement learning, popular algorithms, detailed K‑Nearest Neighbors examples for classification and regression, and the essential steps to build and evaluate models.

PythonkNNmachine learning

0 likes · 21 min read

Master Machine Learning Basics: Concepts, Types, Algorithms & K‑NN Walkthrough

Programmer DD

Aug 31, 2020 · Artificial Intelligence

AI Fighter Falco Beats Human Pilot in Simulated Dogfight – Implications for Military AI

DARPA’s ACE program showcased the AI‑driven fighter Falco, built with the open‑source AdeptRL reinforcement‑learning framework, which defeated an experienced US Air Force instructor in a 1‑v‑1 simulated F‑16 dogfight, highlighting both the promise and current limitations of autonomous combat systems.

AIDARPASimulation

0 likes · 7 min read

AI Fighter Falco Beats Human Pilot in Simulated Dogfight – Implications for Military AI

DataFunTalk

Aug 15, 2020 · Artificial Intelligence

Dynamic Knapsack Optimization for Multi‑Channel Sequential Advertising Using Long‑Term Value

The article presents a novel multi‑channel sequential advertising framework that models budget‑constrained GMV optimization as a dynamic knapsack problem, introduces a long‑term value‑based RL solution (MSBCB), and validates its superiority through extensive offline and online experiments showing up to 10% ROI improvement.

Advertisingbudget optimizationdynamic knapsack

0 likes · 16 min read

Dynamic Knapsack Optimization for Multi‑Channel Sequential Advertising Using Long‑Term Value

Aotu Lab

Jul 22, 2020 · Frontend Development

How Q‑Learning Can Power Smart UI Testing and Scalable Pop‑ups with Puppeteer

This article explains how reinforcement‑learning (Q‑learning) can generate mock interface data for regression testing, how Puppeteer automates UI interactions, and how a DSL‑plus‑runtime approach enables scalable pop‑up components, reducing testing costs in complex e‑commerce interactions.

AutomationFrontend TestingPuppeteer

0 likes · 8 min read

How Q‑Learning Can Power Smart UI Testing and Scalable Pop‑ups with Puppeteer

DataFunTalk

Jul 21, 2020 · Artificial Intelligence

WeChat "Look" Recommendation System: Architecture, Modeling, and Engineering Challenges

This article details the end‑to‑end technical architecture of WeChat's "Look" personalized recommendation service, covering data collection, recall, multi‑stage ranking, various CTR and multi‑objective models, reinforcement‑learning based mixing, diversity optimization, and the engineering hurdles overcome to deploy these solutions at massive scale.

CTR PredictionWeChat AIdeep learning

0 likes · 17 min read

WeChat "Look" Recommendation System: Architecture, Modeling, and Engineering Challenges

58 Tech

Jul 8, 2020 · Artificial Intelligence

Budget Pacing Techniques and Their Application in 58.com Advertising Platform

This article introduces mainstream budget‑pacing methods for cost‑per‑click online ads, describes the 58.com business scenarios, details the pacing algorithm—including bid modification, probabilistic throttling, and reinforcement‑learning approaches—explains system design with PID control, and presents online experimental results and future directions.

Ad TechBudget PacingOnline Advertising

0 likes · 14 min read

Budget Pacing Techniques and Their Application in 58.com Advertising Platform

Taobao Frontend Technology

Jun 30, 2020 · Frontend Development

How Reinforcement Learning Powers Front‑End Testing for Alibaba’s 618 Interactive Game

This article explains how the Taobao front‑end team tackled the complexity of the 618 interactive game by using reinforcement‑learning‑driven intelligent testing, Puppeteer‑based automated regression, and a decoupled UI‑logic architecture for scalable popup production, dramatically improving development efficiency and stability.

PuppeteerUI logic decouplingautomated testing

0 likes · 10 min read

How Reinforcement Learning Powers Front‑End Testing for Alibaba’s 618 Interactive Game

HomeTech

Jun 10, 2020 · Artificial Intelligence

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.

Bandit AlgorithmsEpsilon-GreedyExploitation

0 likes · 10 min read

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

DataFunTalk

May 15, 2020 · Artificial Intelligence

Optimizing Sparse Feature Embedding for Large‑Scale Recommendation and CTR Prediction

The article reviews recent research on representing massive sparse features in click‑through‑rate (CTR) models, introducing Alibaba's Res‑embedding method and Google's Neural Input Search (NIS) approach, and discusses how these techniques improve embedding efficiency and model generalization in large‑scale recommendation systems.

CTR PredictionRecommendation Systemsdeep learning

0 likes · 10 min read

Optimizing Sparse Feature Embedding for Large‑Scale Recommendation and CTR Prediction

JD Retail Technology

May 13, 2020 · Artificial Intelligence

JD's Two Papers Accepted at IJCAI2020 and SIGIR2020: Hierarchical Reinforcement Learning for Multi‑Goal Recommendation and Attention‑Based pCVR Prediction

JD announced that two of its research papers—one on a hierarchical reinforcement‑learning framework for multi‑objective recommendation (MaHRL) and another on an attention‑based model for delayed‑feedback conversion‑rate prediction (pCVR)—were accepted as full papers at the prestigious IJCAI2020 and SIGIR2020 conferences, highlighting the company's strong AI capabilities.

Recommendation Systemsartificial-intelligenceconversion rate prediction

0 likes · 6 min read

JD's Two Papers Accepted at IJCAI2020 and SIGIR2020: Hierarchical Reinforcement Learning for Multi‑Goal Recommendation and Attention‑Based pCVR Prediction

Alibaba Cloud Developer

May 11, 2020 · Artificial Intelligence

How Reinforcement Learning Revolutionizes E‑commerce Product Ranking

This article details the evolution of AliExpress product ranking from simple DNN scoring to advanced reinforcement‑learning re‑ranking, comparing multiple models, exploring context effects, introducing pointer‑network generators, evaluating various RL algorithms, and reporting significant online gains in conversion and GMV.

e-commerceonline experimentsproduct ranking

0 likes · 28 min read

How Reinforcement Learning Revolutionizes E‑commerce Product Ranking

Alibaba Cloud Developer

Apr 24, 2020 · Artificial Intelligence

How Reinforcement Learning Can Supercharge New Media Marketing Strategies

This article examines the limitations of traditional new media marketing, explains reinforcement learning fundamentals, and presents a six‑step technical solution—including problem modeling, algorithm selection, action, state, reward design, and model training—that uses RL to optimize budget allocation and achieve over 35% improvement in campaign effectiveness while reducing costs.

AIbudget optimizationdigital advertising

0 likes · 20 min read

How Reinforcement Learning Can Supercharge New Media Marketing Strategies

360 Quality & Efficiency

Apr 17, 2020 · Artificial Intelligence

Extending APEX for Real Distributed Reinforcement Learning with tf2rl

The article examines the limitations of the single‑machine APEX framework in the tf2rl reinforcement‑learning library, proposes a cross‑machine distributed architecture using middleware such as Redis, compares alternative frameworks like EasyRL, and outlines expected performance gains and future development plans.

APEXOff-PolicyTensorFlow

0 likes · 5 min read

Extending APEX for Real Distributed Reinforcement Learning with tf2rl

DataFunTalk

Apr 12, 2020 · Artificial Intelligence

Wang Zhe’s Machine Learning Notes – Answers to Frequently Asked Questions on Recommendation Systems

In this article, Wang Zhe addresses fifteen common questions about recommendation systems, covering topics such as building cross‑domain knowledge, the role of deep reinforcement learning, handling sparse or low‑sample data, offline‑online evaluation, knowledge graphs, graph neural networks, model interpretability, large‑scale ID embedding, and career advice for engineers.

Graph Neural NetworkKnowledge GraphRecommendation Systems

0 likes · 14 min read

Wang Zhe’s Machine Learning Notes – Answers to Frequently Asked Questions on Recommendation Systems

DataFunTalk

Mar 27, 2020 · Artificial Intelligence

Understanding Data Product Layers: Business Value, Data, Algorithms, and Applications

The article explains how data products create business value through application, data, and algorithm layers, using examples like 5G infrared temperature screening and ImageNet, and discusses the roles of experimental design, causal inference, and reinforcement learning in building effective AI‑driven strategies.

Data ProductExperimental Designartificial-intelligence

0 likes · 8 min read

Understanding Data Product Layers: Business Value, Data, Algorithms, and Applications

Alibaba Cloud Developer

Feb 14, 2020 · Artificial Intelligence

How Alibaba’s AI Voice Bots Revolutionized Customer Service During the Pandemic

This article explains how Alibaba leveraged AI‑powered voice robots to handle massive outbound call volumes during COVID‑19, detailing the technology stack, real‑world application scenarios across finance and retail, and the future potential of intelligent voice assistants in customer service.

AIcustomer servicenatural language processing

0 likes · 11 min read

How Alibaba’s AI Voice Bots Revolutionized Customer Service During the Pandemic

Alibaba Cloud Developer

Feb 7, 2020 · Artificial Intelligence

Tackling Scalability, Data Scarcity, and Training Efficiency in Dialogue Management Models

This article reviews the evolution of dialogue management models from rule‑based systems to deep‑learning approaches, identifies three major challenges—poor scalability, limited annotated data, and low training efficiency—and surveys recent research solutions including semantic matching, knowledge distillation, hierarchical reinforcement learning, model‑based RL, and human‑in‑the‑loop methods.

Conversational AIdata annotationdialogue management

0 likes · 44 min read

Tackling Scalability, Data Scarcity, and Training Efficiency in Dialogue Management Models

Qunar Tech Salon

Feb 5, 2020 · Operations

Understanding Didi's Ride‑Hailing Dispatch Algorithms: Challenges, Models, and Future Directions

The article explains why Didi needs advanced dispatch algorithms, describes the complexities of order‑driver matching from simple one‑to‑one cases to large‑scale bipartite matching, and introduces batch matching, supply‑demand prediction, chain dispatch, and AI‑driven optimizations that together improve global efficiency and user experience.

AIDispatchRide Hailing

0 likes · 16 min read

Understanding Didi's Ride‑Hailing Dispatch Algorithms: Challenges, Models, and Future Directions

Top Architect

Jan 16, 2020 · Artificial Intelligence

A Survey of Neural Architecture Search: Search Spaces, Optimization Strategies, and Recent Results

This article surveys neural architecture search, classifying existing methods, describing common search spaces—including global and cell‑based designs—detailing optimization strategies such as reinforcement learning, evolutionary algorithms, surrogate models, one‑shot and differentiable approaches, and highlighting recent results and trends in the field.

Evolutionary AlgorithmsNASNeural Architecture Search

0 likes · 13 min read

A Survey of Neural Architecture Search: Search Spaces, Optimization Strategies, and Recent Results

DataFunTalk

Jan 2, 2020 · Artificial Intelligence

Improving Zhihu Search: Query Understanding, Term Weighting, Synonym Expansion, Query Rewriting, and Semantic Retrieval

This article details Zhihu's search engineering advances over the past year, covering long‑tail query challenges, term‑weight calculation, synonym expansion, query rewriting with translation models and reinforcement learning, and semantic retrieval using BERT‑based embeddings, while outlining future research directions.

NLPQuery RewritingSearch

0 likes · 14 min read

Improving Zhihu Search: Query Understanding, Term Weighting, Synonym Expansion, Query Rewriting, and Semantic Retrieval

DataFunTalk

Dec 16, 2019 · Artificial Intelligence

A Comprehensive Overview of Sequential Recommendation Models and Techniques

This article provides an in-depth overview of sequential recommendation, defining the problem, discussing data preparation, and reviewing various neural architectures—including MLP, CNN, RNN, Temporal CNN, self‑attention, and reinforcement‑learning approaches—while offering practical guidance on model selection and implementation.

CNNRNNSequential Modeling

0 likes · 36 min read

A Comprehensive Overview of Sequential Recommendation Models and Techniques

DataFunTalk

Dec 10, 2019 · Artificial Intelligence

Applying Deep Reinforcement Learning (DQN) to the 2048 Game: Experiments and Insights

This article details a series of reinforcement‑learning experiments on the 2048 game, from random baselines through DQN implementations, classical value‑iteration methods, network redesigns, and Monte‑Carlo tree search, highlighting challenges such as reward design, over‑estimation, and exploration while achieving scores up to 34 000 and tiles of 2048.

2048AIDQN

0 likes · 8 min read

Applying Deep Reinforcement Learning (DQN) to the 2048 Game: Experiments and Insights

DataFunTalk

Nov 27, 2019 · Artificial Intelligence

Applying Reinforcement Learning and Graph Embedding for Intelligent User Operations in Didi Ride‑Sharing

This article describes how Didi Ride‑Sharing leverages reinforcement learning and graph‑embedding techniques to model and optimize user‑operation marketing, detailing system architecture, algorithm design, experimental ROI improvements, and personalized message delivery for enhanced conversion and cost efficiency.

DidiROIgraph embedding

0 likes · 11 min read

Applying Reinforcement Learning and Graph Embedding for Intelligent User Operations in Didi Ride‑Sharing

AntTech

Oct 30, 2019 · Artificial Intelligence

Financial Graph Machine Learning, AutoML, and Multi‑Agent Reinforcement Learning at Ant Financial

Professor Song Le presented at the Cloudwise Conference how Ant Financial leverages large‑scale graph neural networks, automated machine‑learning platforms, and multi‑agent reinforcement learning to model complex financial networks, improve risk control, and drive diverse fintech applications.

Ant FinancialGraph Neural NetworksLarge-Scale Graph

0 likes · 12 min read

Financial Graph Machine Learning, AutoML, and Multi‑Agent Reinforcement Learning at Ant Financial

DataFunTalk

Oct 25, 2019 · Artificial Intelligence

Advances and Challenges in Human‑Machine Dialogue: Open‑Domain and Task‑Oriented Systems

This article reviews recent progress and open research problems in human‑machine dialogue, covering both open‑domain chat and task‑oriented systems, with focus on reply quality, decoding, retrieval‑augmented generation, controllable and personalized responses, multi‑turn modeling, reinforcement‑learning strategies, low‑resource NLU, and data augmentation techniques.

Dialogue SystemsResponse Generationnatural language processing

0 likes · 16 min read

Advances and Challenges in Human‑Machine Dialogue: Open‑Domain and Task‑Oriented Systems

Tencent Cloud Developer

Oct 11, 2019 · Cloud Computing

Large-Scale Distributed Reinforcement Learning Solution Based on TKE

The project replaces cumbersome manual management of thousands of heterogeneous CPU and GPU nodes for large‑scale reinforcement learning with a TKE‑based, containerized actor‑learner architecture that automates batch start/stop, provides elastic autoscaling, fault‑tolerant processes, shared model storage, and CI‑driven image deployment, cutting costs by up to two‑thirds while dramatically speeding experiment cycles.

CI/CDCloud NativeResource Management

0 likes · 14 min read

Large-Scale Distributed Reinforcement Learning Solution Based on TKE

DataFunTalk

Sep 30, 2019 · Artificial Intelligence

Reinforcement Learning for Recommender Systems: Challenges, Solutions, and Key Papers

This article reviews recent advances in applying reinforcement learning to recommendation systems, explains the fundamental RL concepts, discusses the specific challenges such as large action spaces, bias, and long‑term reward modeling, and summarizes two influential YouTube papers along with practical insights and future directions.

Off-PolicyUser Modelinglong-term reward

0 likes · 13 min read

Reinforcement Learning for Recommender Systems: Challenges, Solutions, and Key Papers

DataFunTalk

Sep 19, 2019 · Artificial Intelligence

Alibaba Cloud Xiaomai Dialogue System: Architecture, NLU, Dialogue Management, and User Simulator

This article presents Alibaba's Xiaomai intelligent dialogue platform, detailing its general system architecture, three-tier NLU approaches for zero‑, few‑, and many‑shot scenarios, platform‑centric dialogue management with TaskFlow, robustness and continuous learning mechanisms, and a user simulator for large‑scale data generation and dialogue diagnosis.

Natural Language Understandingdialogue systemmeta-learning

0 likes · 13 min read

Alibaba Cloud Xiaomai Dialogue System: Architecture, NLU, Dialogue Management, and User Simulator

DataFunTalk

Sep 18, 2019 · Operations

Understanding Didi's Ride‑Hailing Dispatch Algorithm: Challenges, Models, and Strategies

This article explains why modern ride‑hailing platforms need advanced dispatch algorithms, describes the underlying order‑allocation problem, explores simple and complex matching scenarios, and introduces batch matching, supply‑demand prediction, chain dispatch, and AI‑driven techniques used by Didi to improve efficiency and fairness.

DispatchRide Hailingdynamic VRP

0 likes · 15 min read

Understanding Didi's Ride‑Hailing Dispatch Algorithm: Challenges, Models, and Strategies