Tag

reinforcement learning

0 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Feb 26, 2025 · Artificial Intelligence

Engineers' Perspectives on DeepSeek: Technical Innovations and Implications

Thirteen engineers praise DeepSeek’s open‑source, reinforcement‑learning‑driven architecture—using FP8 storage and SFT‑free training—to deliver GPT‑4‑level reasoning at one‑twentieth the cost, enabling single‑GPU deployment, lowering barriers for academia and startups, and prompting notable market reactions that could democratize advanced AI.

AI cost reductionDeepSeekFP8
0 likes · 9 min read
Engineers' Perspectives on DeepSeek: Technical Innovations and Implications
Tencent Technical Engineering
Tencent Technical Engineering
Feb 19, 2025 · Artificial Intelligence

Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments

This note surveys four open‑source reproductions of DeepSeek R1/R1‑zero reinforcement‑learning pipelines, re‑implements their training on math and logic datasets using Qwen‑based models, shows that format‑plus‑accuracy rewards improve long‑chain reasoning though stability and scaling remain challenges, and outlines future directions for large‑scale RL and business deployment.

DeepSeek-R1large language modellong chain of thought
0 likes · 39 min read
Reproduction and Analysis of DeepSeek R1/R1‑zero Reinforcement Learning Experiments
Alimama Tech
Alimama Tech
Jan 8, 2025 · Artificial Intelligence

Model-Based Reinforcement Learning Auto‑Bidding Algorithms for Online Advertising

The paper introduces a model‑based reinforcement‑learning auto‑bidding framework that learns a neural‑network environment model from real logs, generates confidence‑aware virtual data fused with real data, and employs the COMBO+MICRO stabilizer and a Lagrange‑dual method for ROI‑constrained bidding, delivering up to 6.8 % higher consumption, 5 % GMV growth and 3.7 % ROI improvement on Alibaba’s platform.

auto-biddingbudget constrained biddingmodel-based RL
0 likes · 22 min read
Model-Based Reinforcement Learning Auto‑Bidding Algorithms for Online Advertising
Alimama Tech
Alimama Tech
Dec 17, 2024 · Artificial Intelligence

AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games

AuctionNet is a newly introduced benchmark that recreates a massive, realistic online advertising auction environment using latent diffusion‑generated traffic data, provides an 80 GB dataset of 5 × 10⁸ logs from 48 bidding agents, and offers baseline evaluations—including an Online LP that outperforms others—supporting thousands of fair NeurIPS 2024 competition submissions and open‑source tools for large‑scale game decision‑making research.

Benchmarkauto-biddinggenerative models
0 likes · 15 min read
AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games
Alimama Tech
Alimama Tech
Dec 4, 2024 · Artificial Intelligence

AIGB: Generative Auto‑Bidding via Diffusion Modeling

AIGB, introduced by Alibaba Mama in 2023, reframes large‑scale ad‑auction auto‑bidding as a generative sequence task using diffusion models, achieving up to 5 % GMV gains, improved stability and interpretability, and is now commercialized, open‑sourced, and featured in a NeurIPS‑endorsed competition.

AIadvertisingauto-bidding
0 likes · 12 min read
AIGB: Generative Auto‑Bidding via Diffusion Modeling
Didi Tech
Didi Tech
May 23, 2023 · Artificial Intelligence

Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques

The article surveys Didi’s driver‑passenger matching challenges and presents a suite of solutions—from greedy nearest‑driver and Kuhn‑Munkres bipartite matching to stable marriage, dynamic and one‑to‑many assignments, reinforcement‑learning, routing and queueing models—while validating assumptions statistically, integrating preference‑aware machine learning, and outlining multi‑objective and digital‑twin future research.

AlgorithmOptimizationRide-hailing
0 likes · 23 min read
Driver‑Passenger Matching in Didi’s Ride‑Hailing Market: Algorithms and Techniques