Tagged articles

pretraining

122 articles · Page 1 of 2

Jun 11, 2026 · Artificial Intelligence

Keye-VL-2.0 Brings DeepSeek Sparse Attention to Multimodal AI – Report Released

Keye‑VL‑2.0, an open‑source MoE multimodal foundation model, tackles hour‑level video understanding and agentic intelligence by embedding DeepSeek Sparse Attention into a GQA‑based architecture, enabling near‑lossless 256 K token context, four‑stage pre‑training, diverse RL distillation techniques, and achieving state‑of‑the‑art results on long‑video benchmarks, with weights publicly released.

MoEMultimodalRL distillation

0 likes · 8 min read

Keye-VL-2.0 Brings DeepSeek Sparse Attention to Multimodal AI – Report Released

Machine Learning Algorithms & Natural Language Processing

Jun 10, 2026 · Artificial Intelligence

OneReason: Enabling Recommendation Systems to Reason

OneReason introduces a systematic reasoning capability into industrial recommendation models through multi‑stage pre‑training, chain‑of‑thought fine‑tuning, and reinforcement learning, achieving significant gains in click‑through, revenue, and cross‑domain recommendation performance while preserving the underlying language abilities of the base model.

Chain-of-ThoughtRecommendation Systemsindustrial AI

0 likes · 29 min read

OneReason: Enabling Recommendation Systems to Reason

Machine Heart

Jun 9, 2026 · Artificial Intelligence

OneReason: When Recommendation Systems Learn to Reason

The OneReason report details how Kuaishou’s recommendation team injects reasoning into large‑scale recommender models through a four‑level pre‑training pipeline, chain‑of‑thought (CoT) fine‑tuning, and specialized reinforcement learning, achieving significant offline gains and a 10.33% exposure lift in a live A/B test.

CoTIndustryLLM

0 likes · 31 min read

OneReason: When Recommendation Systems Learn to Reason

Machine Heart

May 28, 2026 · Artificial Intelligence

Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes

The newly open‑sourced Wall‑OSS‑0.5 VLA model demonstrates that a large‑scale pre‑trained embodied robot brain can achieve strong zero‑shot performance on 17 real‑world tasks, exhibit staircase emergence with longer pre‑training, and far surpass the industry baseline after fine‑tuning, while also revealing current precision limits.

BenchmarkEmbodied AIVLA

0 likes · 15 min read

Can a Pre‑trained Embodied Model Work Out‑of‑the‑Box? New Chinese Open‑Source VLA Model Shows Yes

Mike Chen's Internet Architecture

May 21, 2026 · Artificial Intelligence

Demystifying AI Large Models: Architecture, Principles, and Workflow

The article explains that large language models are massive probability engines built on the Transformer architecture with self‑attention, trained through costly pre‑training on trillions of tokens, then refined by instruction fine‑tuning and RLHF, ultimately predicting the next token to generate text.

RLHFSelf-AttentionToken Prediction

0 likes · 5 min read

Demystifying AI Large Models: Architecture, Principles, and Workflow

SuanNi

May 20, 2026 · Industry Insights

Why Karpathy’s Sudden Move to Anthropic Could Shift the AI IPO Landscape

Andrej Karpathy announced his return to frontline AI research by joining Anthropic just as both companies prepare for IPOs, a move that leverages his extensive background, reflects shifting LLM scaling priorities, and signals a strategic talent and technology win for Anthropic in the competitive AI market.

AI industryAI talentAndrej Karpathy

0 likes · 12 min read

Why Karpathy’s Sudden Move to Anthropic Could Shift the AI IPO Landscape

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

The article distinguishes two meanings of “end‑to‑end,” then outlines four sequential stages—defining data and scenario, massive pre‑training on audio‑text pairs, task alignment via instruction or supervised fine‑tuning, and optional preference tuning—to guide engineers in building usable speech assistants.

audio dataend-to-end modelsinstruction fine-tuning

0 likes · 6 min read

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

Weekly Large Model Application

May 5, 2026 · Artificial Intelligence

What Pretraining Actually Teaches: Listening to All Sounds

The article explains that pretraining for speech models functions like a broad liberal‑arts education, teaching universal acoustic and linguistic patterns through next‑token prediction, joint audio‑text training, and mask‑or contrast objectives, while clarifying common misconceptions and highlighting data bias and the need for clean, task‑specific fine‑tuning.

audio-text alignmentdata biasfine-tuning

0 likes · 6 min read

What Pretraining Actually Teaches: Listening to All Sounds

AgentGuide

Apr 26, 2026 · Artificial Intelligence

Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide

This article breaks down the fundamentals of large model training—covering data, parameters, neural networks, loss functions, gradient descent, pre‑training, and fine‑tuning—in plain language so readers can grasp how massive models learn without needing to dive into complex mathematics.

Model Trainingfine-tuninggradient descent

0 likes · 12 min read

Can You Explain Large Model Training Without Complex Formulas? A Simple, Clear Guide

AgentGuide

Apr 19, 2026 · Artificial Intelligence

Understanding the Key Differences Between Large Model Pretraining and Fine‑Tuning

The article explains how pretraining on massive generic data creates a reusable base model, while fine‑tuning uses smaller, high‑quality task‑specific data to adapt the model, covering objectives, data scale, cost, methods, and why most projects prefer fine‑tuning.

LoRAPEFTfine-tuning

0 likes · 6 min read

Understanding the Key Differences Between Large Model Pretraining and Fine‑Tuning

Machine Learning Algorithms & Natural Language Processing

Apr 17, 2026 · Artificial Intelligence

Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute

The article examines how the dramatic increase in GPU compute power—illustrated by a single H100 GPU equaling about 200 Hadoop instances—challenges the dominance of tree‑based models for structured data, presents scaling‑law experiments with KMLP and FOUND, and argues that pre‑training can redefine the balance between compute, data, and algorithms.

FOUNDGPUKMLP

0 likes · 10 min read

Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute

Machine Heart

Apr 17, 2026 · Artificial Intelligence

Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts

The article examines how a single NVIDIA H100 GPU delivers roughly 200‑fold more FP16 compute than a 96‑core CPU Hadoop node, explores the "Bitter Lesson" of scaling‑driven AI breakthroughs, and presents large‑scale pretraining experiments that show table and sequence models now exhibit clear scaling laws, challenging the dominance of traditional tree‑based approaches.

FOUNDKMLPScaling Law

0 likes · 10 min read

Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts

Lao Guo's Learning Space

Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOGRPO

0 likes · 17 min read

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

AI Large-Model Wave and Transformation Guide

Mar 28, 2026 · Artificial Intelligence

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

This guide breaks down the four major large‑model training paradigms—pre‑training, supervised fine‑tuning, preference alignment, and RLHF—explaining which parameters are updated, how attention is reshaped, and what capabilities are gained, so you can deliver a structured, interview‑ready answer.

AI interviewLLMLarge Language Models

0 likes · 8 min read

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

Machine Learning Algorithms & Natural Language Processing

Mar 15, 2026 · Artificial Intelligence

630‑Line Autoresearch Generates 81 Agents, 2,300 Experiments and Ten Pre‑training Insights

A 630‑line Python Autoresearch project sparked a community‑run distributed system that created over 80 autonomous AI agents, executed more than 2,300 experiments in four days, self‑organized roles and peer‑review, and uncovered ten concrete pre‑training findings.

AI agentsautoresearchdistributed training

0 likes · 9 min read

630‑Line Autoresearch Generates 81 Agents, 2,300 Experiments and Ten Pre‑training Insights

Machine Learning Algorithms & Natural Language Processing

Mar 5, 2026 · Artificial Intelligence

Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI

Zitong Yang’s Stanford PhD defense introduced “continually self‑improving AI,” a system that autonomously refines its own parameters, generates synthetic training data, and even designs its own learning algorithms, with experiments on synthetic continual training, synthetic‑bootstrap pre‑training, and AI‑design‑AI demonstrating measurable gains over static baselines.

AI researchContinual Learningpretraining

0 likes · 35 min read

Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI

Bighead's Algorithm Notes

Mar 3, 2026 · Artificial Intelligence

How HORAI Uses Large‑Scale Multimodal Pretraining to Boost Time‑Series Forecasting and Anomaly Detection

The article reviews the HORAI model, which introduces a frequency‑enhanced multimodal pretraining paradigm and the massive MM‑TS dataset, showing that integrating derived images, endogenous text, and real‑world news dramatically improves zero‑shot forecasting and anomaly detection across six domains.

Anomaly DetectionHORAIMultimodal Learning

0 likes · 23 min read

How HORAI Uses Large‑Scale Multimodal Pretraining to Boost Time‑Series Forecasting and Anomaly Detection

Data Party THU

Jan 22, 2026 · Artificial Intelligence

Unlocking Large Model Training: Pretraining, Fine‑Tuning, and Alignment Explained

This article breaks down the three core stages of large language model training—pretraining, supervised fine‑tuning, and alignment—detailing their objectives, typical data formats, scale requirements, and the latest techniques such as RLHF and DPO.

AI trainingalignmentpretraining

0 likes · 11 min read

Unlocking Large Model Training: Pretraining, Fine‑Tuning, and Alignment Explained

PaperAgent

Jan 19, 2026 · Artificial Intelligence

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

Recent research shows that applying reinforcement learning to large language models can dramatically improve inference performance, but its effectiveness depends on the token distribution produced during pre‑training, prompting a novel rewrite of cross‑entropy as a single‑step policy gradient with controllable entropy parameters.

LLMModel OptimizationRL

0 likes · 6 min read

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

JD Tech

Jan 13, 2026 · Artificial Intelligence

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

This extensive guide walks readers through the fundamentals of large language models, covering transformer architecture, pre‑training and fine‑tuning techniques, scaling laws, emergent abilities, mixture‑of‑experts designs, and practical comparisons, providing clear explanations, code snippets, and visual illustrations for deep learning practitioners.

Mixture of Expertsemergent abilitiesfine-tuning

0 likes · 47 min read

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

Frontend AI Walk

Dec 2, 2025 · Artificial Intelligence

Understanding LLMs: A Frontend Developer’s Primer on Large Language Models

The article demystifies large language models for frontend developers by likening token prediction to autocomplete, explaining tokens, context windows, temperature, the two-stage training process, and the critical role of prompts, using concrete code examples and analogies to familiar frontend concepts.

Frontend AnalogyLLMPrompt engineering

0 likes · 10 min read

Understanding LLMs: A Frontend Developer’s Primer on Large Language Models

Data Party THU

Nov 16, 2025 · Artificial Intelligence

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model

The X‑VLA paper introduces a 0.9‑billion‑parameter, fully open‑source embodied model that uses a learnable soft‑prompt and divide‑and‑conquer encoding to handle heterogeneous robot vision inputs, achieving a record‑breaking 120‑minute autonomous clothing‑folding task while surpassing benchmarks across five simulation environments.

Embodied AIMultimodal LearningX-VLA

0 likes · 7 min read

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model

HyperAI Super Neural

Nov 15, 2025 · Artificial Intelligence

AI Paper Weekly: Scale Pretraining, Game Agents, Attention, Context Engineering

This weekly roundup highlights five recent AI research papers—including CoCa’s contrastive captioning model, the Game‑TARS framework for scalable game agents, Kimi Linear’s efficient attention architecture, the Continuous Autoregressive Language Model (CALM), and a comprehensive survey of Context Engineering—summarizing their core contributions and providing direct links.

AILanguage Modelsattention architecture

0 likes · 6 min read

AI Paper Weekly: Scale Pretraining, Game Agents, Attention, Context Engineering

Ctrip Technology

Nov 6, 2025 · Artificial Intelligence

How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting

TripCast introduces a masked 2D transformer pre‑training framework that treats travel demand as a two‑dimensional time‑series problem, leveraging time‑patch tokenization, dual masking and RevIN normalization to achieve state‑of‑the‑art forecasting performance on massive real‑world travel data.

2D transformerArtificial IntelligenceTime Series Forecasting

0 likes · 7 min read

How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting

Ele.me Technology

Oct 27, 2025 · Artificial Intelligence

How IAK Transforms Multi‑Domain Recommendation with Pre‑Training and Fine‑Tuning

This paper introduces IAK, a unified multi‑domain recommendation paradigm that treats the system as a large model, leveraging pre‑training and fine‑tuning with an information‑aware adaptive kernel to capture rapid user interest shifts while reducing training costs and improving online performance.

Large Language ModelsRecommendation Systemsfine‑tuning

0 likes · 18 min read

How IAK Transforms Multi‑Domain Recommendation with Pre‑Training and Fine‑Tuning

Wu Shixiong's Large Model Academy

Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

LLMalignmentgradient stability

0 likes · 9 min read

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

Data Party THU

Sep 26, 2025 · Artificial Intelligence

How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding

Keye‑VL‑1.5, an 8‑billion‑parameter multimodal large language model, introduces a Slow‑Fast video encoding strategy, a four‑stage progressive pre‑training pipeline with 128K context, and a sophisticated post‑training regime that together achieve state‑of‑the‑art performance on video and vision‑language benchmarks while maintaining strong general capabilities.

Benchmarklarge language modelmultimodal LLM

0 likes · 21 min read

How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding

Bighead's Algorithm Notes

Aug 28, 2025 · Artificial Intelligence

Key AI-Driven Quantitative Finance Papers from KDD2025

This article summarizes recent AI research on quantitative finance, covering AlphaAgent's LLM-driven alpha mining, UMI's multi‑level irrationality factors, PDU's progressive dependency learning for stock ranking, SSPT's stock‑specific pretraining transformer, and Enhancer's distribution‑aware meta‑learning framework, all of which demonstrate improved stock prediction and resistance to alpha decay.

Alpha MiningLLMMeta Learning

0 likes · 9 min read

Key AI-Driven Quantitative Finance Papers from KDD2025

Bighead's Algorithm Notes

Aug 26, 2025 · Artificial Intelligence

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.

Transformerfinancial AImultitask learning

0 likes · 11 min read

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

Data Party THU

Aug 20, 2025 · Artificial Intelligence

How Large-Scale Corpus Rewriting is Shaping LLM Training: A Deep Dive into K2, WRAP, and Beyond

This article surveys recent large‑scale corpus rewriting techniques for LLM pre‑training, covering K2’s token‑utilization strategies, domain‑specific methods like SwallowMath/Code, reStructured pretraining, the WRAP pipeline, Nemotron‑CC filtering, Pro‑X noise removal, and the MAGA multi‑style expansion, while highlighting challenges, experimental findings, and open research questions.

Data SynthesisLLMcorpus rewriting

0 likes · 20 min read

How Large-Scale Corpus Rewriting is Shaping LLM Training: A Deep Dive into K2, WRAP, and Beyond

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsVAEViT³

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Diffusion ModelsVision Transformerimage generation

0 likes · 10 min read

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

Data Thinking Notes

Jun 2, 2025 · Artificial Intelligence

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Pre‑training enables AI models to first acquire a universal knowledge map from massive unlabelled text, then quickly adapt to specific tasks with minimal labelled data, offering superior generalization, reduced annotation costs, and versatile applications across chatbots, content creation, retrieval, coding assistance, and more.

AI ApplicationsLarge Language ModelsTransformer

0 likes · 14 min read

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Baobao Algorithm Notes

May 13, 2025 · Artificial Intelligence

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

The article details Qwen3's three‑phase pretraining pipeline, long‑context extensions, a cold‑start long‑chain‑of‑thought dataset, reinforcement‑learning fine‑tuning with custom rewards, and a two‑stage distillation process that yields versatile, thought‑controlled language models.

DistillationQwen3long-context

0 likes · 15 min read

How Qwen3 Achieves Multi-Stage Pretraining, Long-Context, and Thought-Controlled RL

Tencent Technical Engineering

May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMTokenization

0 likes · 25 min read

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

AIWalker

May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

BenchmarkSupervised Fine‑Tuningautoregressive

0 likes · 14 min read

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

AIWalker

Apr 28, 2025 · Artificial Intelligence

SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters

SimpleAR is a minimalist autoregressive visual generation framework that, with only 0.5 B parameters, achieves competitive 1024×1024 image synthesis through a three‑stage pipeline of large‑scale pretraining, supervised fine‑tuning, and GRPO‑based reinforcement learning, and demonstrates significant inference speedups using KV‑cache, vLLM, and speculative decoding.

Benchmarkautoregressive generationinference acceleration

0 likes · 14 min read

SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters

Alimama Tech

Apr 23, 2025 · Artificial Intelligence

Distribution-aware Graph Prompt Tuning (DAGPrompT) for Heterophilic Graphs

Distribution‑aware Graph Prompt Tuning (DAGPrompT) tackles the pre‑training/downstream mismatch on heterophilic graphs by jointly applying low‑rank GLoRA adaptation and hop‑specific prompts that recast tasks as link‑prediction, yielding up to 4.79% accuracy gains and an average 2.43% improvement in few‑shot node classification.

Graph Neural NetworksPrompt Tuningdistribution-aware

0 likes · 9 min read

Distribution-aware Graph Prompt Tuning (DAGPrompT) for Heterophilic Graphs

Cognitive Technology Team

Mar 22, 2025 · Artificial Intelligence

Three Stages of Developing Large Language Models and Practical Guidance

The article outlines the three development phases of large language models—building, pre‑training, and fine‑tuning—describes usage options, highlights key factors such as data scale, architecture, training processes, and evaluation, and offers practical advice for cost‑effective development.

LLMModel Developmentfine-tuning

0 likes · 3 min read

Three Stages of Developing Large Language Models and Practical Guidance

JD Tech Talk

Mar 5, 2025 · Artificial Intelligence

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GLM introduces a unified pretraining framework that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle, achieving superior performance over BERT, T5 and GPT on a range of NLU and generation tasks such as SuperGLUE, text‑filling, and language modeling.

2D positional encodingGLMLanguage Model

0 likes · 27 min read

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

JD Cloud Developers

Mar 5, 2025 · Artificial Intelligence

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

GLM introduces a universal language model that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle training, achieving superior performance over BERT, T5, and GPT across NLU, conditional and unconditional generation tasks, as demonstrated on SuperGLUE and other benchmarks.

Language ModelNLUTransformer

0 likes · 29 min read

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

Architect

Feb 11, 2025 · Artificial Intelligence

DeepSeek: Training Process, Working Principles, and Recent Innovations

The article explains DeepSeek's two‑stage training pipeline—including massive pre‑training on trillions of tokens and post‑training via instruction tuning and reinforcement learning from human feedback—describes the differences between its V3 instruction model and R1 reasoning model, and highlights performance optimizations and emerging research directions.

AIDeepSeekInstruction Tuning

0 likes · 8 min read

DeepSeek: Training Process, Working Principles, and Recent Innovations

DataFunSummit

Feb 5, 2025 · Artificial Intelligence

Exploration and Practice of Large‑Model Data Construction

This presentation details engineering‑focused approaches to building, mixing, and filtering data for large language models, covering data preparation, pre‑training mix strategies such as DoReMi, DoGE and online sampling, post‑training data quality selection methods, and practical Q&A on scaling laws and PDF processing.

AIData EngineeringData Mixing

0 likes · 15 min read

Exploration and Practice of Large‑Model Data Construction

Baidu Geek Talk

Dec 25, 2024 · Industry Insights

How to Build a Multimodal Web Page Model for the LLM Era

This article examines the unique multimodal and multi‑granular nature of web pages, compares fusion strategies, proposes a cross‑modal attention approach, outlines fine‑ and coarse‑grained pre‑training tasks, and explores low‑cost adaptor methods for adapting large multimodal models to web‑page modeling in the LLM era.

AIHTMLLLM adaptation

0 likes · 10 min read

How to Build a Multimodal Web Page Model for the LLM Era

NewBeeNLP

Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

LLMQwen2.5fine-tuning

0 likes · 5 min read

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

ZhongAn Tech Team

Dec 22, 2024 · Industry Insights

What’s Driving the AI Boom? New Models, Data Limits, and the Rise of Forgetting

This issue reviews the latest AI breakthroughs—including OpenAI’s O3 and o1 models, pricing cuts, new features in ChatGPT, product launches like Pika 2.0 and Gemini 2.0, a heated debate on pre‑training data bottlenecks sparked by Ilya Sutskever, a novel black‑box forgetting method, and DeepMind’s Genie 2 3D world generator—highlighting how industry dynamics and research directions are reshaping the field.

3D generationAIIndustry Trends

0 likes · 12 min read

What’s Driving the AI Boom? New Models, Data Limits, and the Rise of Forgetting

DevOps

Dec 8, 2024 · Artificial Intelligence

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

This article explains fine‑tuning in machine learning, covering its definition, why it matters, the role of pre‑trained models, detailed step‑by‑step procedures, advantages, and diverse applications such as NLP, computer vision, speech and finance, with practical examples like face recognition and object detection.

AI ApplicationsModel Optimizationfine-tuning

0 likes · 16 min read

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

Baobao Algorithm Notes

Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMTraining Optimizationdata pipeline

0 likes · 18 min read

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

Bilibili Tech

Nov 5, 2024 · Artificial Intelligence

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

Bilibili’s in‑house role‑playing large language model, built on the Index architecture and refined through pre‑training, supervised fine‑tuning, and preference optimization (PPO and DPO), achieved top scores on the Chinese CharacterEval benchmark, surpassing rivals while incorporating safety alignment and showcasing consistent, personality‑driven dialogue examples.

Content SafetyPreference OptimizationSupervised Fine‑Tuning

0 likes · 13 min read

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

Infra Learning Club

Oct 30, 2024 · Artificial Intelligence

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

The article traces the development of GPT series—from the 2017 Transformer breakthrough, through GPT‑1, GPT‑2, and GPT‑3’s 175 billion parameters, to later models like Codex and ChatGPT—highlighting key papers, architectural choices, and the surprising role of OpenAI’s decoder‑only approach.

GPT-3GoogleLanguage Model

0 likes · 4 min read

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

NewBeeNLP

Oct 11, 2024 · Artificial Intelligence

Inside Llama 3: Training, Architecture, and Performance Secrets

An extensive review of Meta’s Llama 3 model breaks down its pre‑training data pipeline, scaling laws, architectural tweaks like GQA and RoPE, post‑training methods such as SFT, DPO, and reward modeling, and evaluates benchmark results, offering practical insights for researchers and engineers building large language models.

BenchmarkingLarge Language ModelsLlama 3

0 likes · 32 min read

Inside Llama 3: Training, Architecture, and Performance Secrets

Bilibili Tech

Sep 18, 2024 · Artificial Intelligence

Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities

Index-1.9B-32K is a 1.9B-parameter model with a 32K token context window, achieving strong long‑text performance comparable to larger models while using only about 2% of GPT‑4’s compute, trained via long pre‑training and supervised fine‑tuning, with a trade‑off of reduced short‑context ability.

AIEvaluationLong Context

0 likes · 12 min read

Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities

NewBeeNLP

Sep 3, 2024 · Industry Insights

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

The answer explains that joining a pre‑training team accelerates a newcomer’s engineering abilities through hands‑on work with large‑scale data pipelines, distributed training code, and debugging, while SFT teams focus mainly on data labeling, making pre‑training the more effective path for rapid skill growth.

AICareer AdviceEngineering Skills

0 likes · 6 min read

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

DataFunSummit

Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Data Qualityinstruction fine-tuningpretraining

0 likes · 18 min read

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

Baobao Algorithm Notes

Aug 29, 2024 · Industry Insights

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

The answer argues that fresh graduates should join pre‑training teams because the required engineering tasks—large‑scale data crawling, Hadoop/Spark pipelines, torch and CUDA setup, megatron code debugging, and scaling‑law experiments—rapidly sharpen coding skills, while SFT work focuses mainly on data labeling and offers slower technical growth.

AI EngineeringCareer AdviceSFT

0 likes · 7 min read

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

DataFunTalk

Aug 5, 2024 · Artificial Intelligence

Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches, and Insights

This article presents a comprehensive study on integrating multimodal image‑text representations into large‑scale e‑commerce advertising CTR models, introducing a semantic‑aware contrastive pre‑training (SCL) method and two application algorithms (SimTier and MAKE) that together achieve over 1 % GAUC improvement and significant online gains.

CTR PredictionRecommendation Systemscontrastive learning

0 likes · 21 min read

Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches, and Insights

Baobao Algorithm Notes

Jul 25, 2024 · Artificial Intelligence

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

The article provides an in‑depth analysis of LLaMA 3 405B, covering its dense Transformer architecture, three‑stage pre‑training (initial, long‑context, annealing), iterative post‑training with RM‑guided rejection sampling, the decision against MOE, and the broader implications for both large and small model development.

405Bmodel architecturemodel distillation

0 likes · 17 min read

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

Architect's Alchemy Furnace

Jul 6, 2024 · Artificial Intelligence

ChatGLM Evolution: Deep Dive into GLM Architecture, Pretraining, and ChatGLM‑4

This article provides a comprehensive technical overview of the ChatGLM series—from the original ChatGLM‑6B model and its GLM‑based pre‑training framework to the enhancements in ChatGLM‑2, the architectural parity of ChatGLM‑3, and the advanced capabilities of the latest ChatGLM‑4, covering model structure, position encoding, attention mechanisms, multi‑task pretraining, and tool integration.

AIChatGLMGLM

0 likes · 25 min read

ChatGLM Evolution: Deep Dive into GLM Architecture, Pretraining, and ChatGLM‑4

Bilibili Tech

Jun 14, 2024 · Artificial Intelligence

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

The report presents the open‑source Index‑1.9B family—base, pure, chat, and character variants—detailing benchmark results, pre‑training optimizations such as a normalized LM‑Head and deeper‑slim architectures, the importance of modest instruction data, alignment via SFT/DPO, role‑play enhancements with RAG, and acknowledges remaining safety and factual limitations.

EvaluationInstruction TuningLLM

0 likes · 15 min read

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

NewBeeNLP

May 31, 2024 · Artificial Intelligence

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

This article analyzes whether large‑scale web crawls, when meticulously filtered and deduplicated, can match or surpass the performance of high‑quality curated datasets in training large language models, covering dataset composition, processing pipelines, experimental results, scaling‑law implications, and future data‑efficiency strategies.

Artificial IntelligenceDataset CleaningLLM

0 likes · 23 min read

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

Baobao Algorithm Notes

May 21, 2024 · Artificial Intelligence

How to Pre‑train a 20M‑Parameter LLaMA‑3 Mini Model with Hugging Face Trainer

This step‑by‑step guide shows how to use Hugging Face's Trainer API to pre‑train an ultra‑small LLaMA‑3 model (under 20 M parameters) on the TinyStories dataset, covering model configuration, tokenizer setup, data preprocessing, collators, training arguments, and inference results.

Hugging FaceLLaMALanguage Model

0 likes · 27 min read

How to Pre‑train a 20M‑Parameter LLaMA‑3 Mini Model with Hugging Face Trainer

DataFunTalk

May 15, 2024 · Artificial Intelligence

Advances in Video Multimodal Retrieval: Video‑Text Semantic Search and Video‑Video Same‑Source Search

This article presents Ant Group's multimodal research on video retrieval, detailing video‑text semantic search and video‑video same‑source search, introducing a large Chinese pre‑training dataset, novel pre‑training, hard‑sample mining, fine‑grained modeling techniques, and an efficient end‑to‑end copyright detection framework.

Multimodal AIcopyright detectionfine-grained modeling

0 likes · 38 min read

Advances in Video Multimodal Retrieval: Video‑Text Semantic Search and Video‑Video Same‑Source Search

DataFunSummit

Apr 24, 2024 · Artificial Intelligence

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

This article presents Baidu's exploration of multimodal content understanding for commercial advertising, detailing the ViCAN pre‑training model, its contrastive and mask‑language learning tasks, integration across recall, ranking and risk‑control pipelines, quantization with MMDict, and future AIGC‑driven generation, all backed by extensive experiments and Q&A.

AIAIGCAdvertising

0 likes · 27 min read

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

DataFunSummit

Mar 27, 2024 · Artificial Intelligence

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

This article reviews Tongyi Lab's work on the OFA framework for generative multimodal pretraining and the ONE-PEACE model for unified multimodal representation learning, detailing their architectures, training strategies, experimental results across vision‑language and audio tasks, and future research directions.

MultimodalOFAONE-PEACE

0 likes · 15 min read

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

NewBeeNLP

Mar 27, 2024 · Artificial Intelligence

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

This article provides a comprehensive technical overview of Meta's Llama 2 series, covering its architectural upgrades such as Group Query Attention, the pre‑training dataset and hyper‑parameters, loss behavior, benchmark comparisons, and the supervised fine‑tuning pipeline with safety considerations.

AILlama 2RLHF

0 likes · 11 min read

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

Alibaba Cloud Big Data AI Platform

Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

Multimodalefficient-aifeature fusion

0 likes · 19 min read

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

Rare Earth Juejin Tech Community

Jan 21, 2024 · Artificial Intelligence

Understanding Pretraining and Fine‑Tuning of Large Language Models: Methods, Resources, and Practical Applications

This article explains the concepts of pretraining and fine‑tuning for large language models, compares full‑parameter, LoRA and QLoRA approaches, discusses resource consumption, introduces the ModelScope SWIFT framework with code examples, and shows how fine‑tuning can improve data‑visualisation tasks while reducing token usage.

Data VisualizationLLMLoRA

0 likes · 24 min read

Understanding Pretraining and Fine‑Tuning of Large Language Models: Methods, Resources, and Practical Applications

Rare Earth Juejin Tech Community

Dec 4, 2023 · Artificial Intelligence

An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications

This article provides a comprehensive English overview of BERT, covering its original paper, model architecture, pre‑training objectives (Masked Language Model and Next Sentence Prediction), differences from ELMo, GPT and vanilla Transformers, parameter counts, main contributions, and a range of NLP application scenarios such as text classification, sentiment analysis, NER, and machine translation.

BERTNLPNext Sentence Prediction

0 likes · 16 min read

An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications

Rare Earth Juejin Tech Community

Nov 26, 2023 · Artificial Intelligence

Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications

This article provides a comprehensive overview of Google's T5 model, detailing its unified text‑to‑text formulation, encoder‑decoder architecture, three model variants, attention mask designs, training strategies, model sizes, experimental results, and key contributions to natural language processing.

Artificial IntelligenceNLPT5

0 likes · 14 min read

Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications

AntTech

Nov 7, 2023 · Artificial Intelligence

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The paper proposes a multi‑scale stochastic distribution prediction (MSDP) framework that learns robust user behavior representations by predicting behavior distributions over random time windows, incorporates contrastive regularization, and demonstrates superior performance on both proprietary financial risk data and a public e‑commerce dataset compared with existing masked and next‑behavior pre‑training methods.

AIMulti-Scaledistribution prediction

0 likes · 13 min read

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

DataFunSummit

Oct 8, 2023 · Artificial Intelligence

NLP Techniques for Financial Risk Control: Text Modeling, Non‑Text Modeling, Long‑Text Handling, Multi‑Modal Fusion and Sample Optimization

This article presents a comprehensive overview of how natural language processing is applied to financial risk control, covering text and non‑text sequence modeling, tokenization strategies, transformer‑based long‑text architectures, multi‑modal fusion methods, pre‑training techniques and practical sample‑optimization approaches.

AINLPText Modeling

0 likes · 22 min read

NLP Techniques for Financial Risk Control: Text Modeling, Non‑Text Modeling, Long‑Text Handling, Multi‑Modal Fusion and Sample Optimization

Rare Earth Juejin Tech Community

Jul 30, 2023 · Artificial Intelligence

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

This article reviews the evolution of the GPT family from GPT‑1 to GPT‑3, comparing encoder‑decoder architectures, explaining the shift from supervised fine‑tuning to zero‑shot and few‑shot learning, and highlighting the architectural and training innovations that enabled large‑scale language models.

GPTLLMTransformer

0 likes · 13 min read

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

Sohu Tech Products

Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

Attention MechanismBERTEncoder-Decoder

0 likes · 13 min read

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

DataFunSummit

Jun 28, 2023 · Artificial Intelligence

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

This article presents OPPO Research Institute's recent advances in large‑model AI, detailing the CHAOS pretrained model that topped the CLUE leaderboard, the knowledge‑enhanced training pipeline, and the GammaE model for multi‑hop reasoning over knowledge graphs, together with experimental results and practical training tips.

AI researchGammaELarge Language Models

0 likes · 20 min read

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

DataFunTalk

Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language Modelingcontrastive learninglow-resource

0 likes · 10 min read

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

Baidu Geek Talk

Mar 13, 2023 · Artificial Intelligence

Recent Advances in Sparse and Dense Retrieval for Search Engines

The article surveys recent academic advances in both sparse inverted‑index and dense semantic retrieval for large‑scale search, highlighting key papers on decision frameworks, benchmarks, sparse lexical models, dual encoders, and hybrid techniques, while discussing challenges such as single‑vector limits and proposing multi‑view and hybrid solutions.

Information RetrievalRankingdense retrieval

0 likes · 12 min read

Recent Advances in Sparse and Dense Retrieval for Search Engines

DataFunTalk

Mar 4, 2023 · Artificial Intelligence

Advances in AIGC: AliceMind Text Generation Models and Multimodal mPLUG from Alibaba DAMO Academy

This article reviews recent AIGC progress, introducing the AliceMind series of text generation models—including PALM, PLUG, and a Chinese GPT‑3—alongside the multimodal mPLUG architecture, and discusses their training strategies, performance results, and practical deployment insights.

AIGCAliceMindMultimodal Generation

0 likes · 16 min read

Advances in AIGC: AliceMind Text Generation Models and Multimodal mPLUG from Alibaba DAMO Academy

Tencent Advertising Technology

Mar 2, 2023 · Artificial Intelligence

Tencent's HunYuan‑NLP 1T Large‑Scale AI Model: Training Techniques, Optimization, and Real‑World Applications

This article details Tencent's development of the 1‑trillion‑parameter HunYuan‑NLP model, covering its MoE architecture, cost‑effective pre‑training strategies, distributed training framework, model compression toolkit, and successful deployment across advertising, gaming, and other Tencent services.

AI InfrastructureMixture of Expertslarge language model

0 likes · 17 min read

Tencent's HunYuan‑NLP 1T Large‑Scale AI Model: Training Techniques, Optimization, and Real‑World Applications

NetEase Cloud Music Tech Team

Jan 10, 2023 · Artificial Intelligence

Sentiment Classification and Topic Clustering for NetEase Cloud Music Comments

To boost NetEase Cloud Music’s comment handling, the authors combine active‑learning‑driven relabeling, domain‑specific MLM pretraining, contrastive‑learning‑based sample expansion, and multi‑task BERT sharing to raise sentiment‑classification precision and recall above 90 % and double moderation clean‑rate, while employing prompt‑generated story themes, IP‑focused classifiers, and hot‑word aggregation for effective short‑text topic clustering and scalable, theme‑aware distribution.

Active LearningMulti-Task LearningNLP

0 likes · 10 min read

Sentiment Classification and Topic Clustering for NetEase Cloud Music Comments

DataFunTalk

Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Deep LearningLarge-Scale DataMultimodal

0 likes · 16 min read

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

Xiaohongshu Tech REDtech

Nov 11, 2022 · Artificial Intelligence

Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk

Prof. Qiu Xipeng’s talk highlighted how large language models can be offered as a service and efficiently adapted via in‑context learning, lightweight label‑tuning, and gradient‑free black‑box optimization, showcasing a unified asymmetric Transformer (CPT) that handles understanding, generation, ABSA and NER tasks while reducing resource demands.

Black-Box OptimizationLLMLanguage Model

0 likes · 15 min read

Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk

Alibaba Cloud Big Data AI Platform

Oct 19, 2022 · Artificial Intelligence

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

CKBERT, a Chinese knowledge‑enhanced BERT developed by Alibaba’s EasyNLP team, integrates external knowledge graphs and internal linguistic cues through novel pre‑training tasks, offers three model sizes compatible with HuggingFace and PAI, and demonstrates superior performance on CLUE and NER benchmarks while providing easy deployment on cloud platforms.

CKBERTChinese NLPEasyNLP

0 likes · 40 min read

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

DataFunTalk

Sep 24, 2022 · Artificial Intelligence

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

This article introduces the importance of image‑text cross‑modal representation, presents the Chinese Zero dataset with two pre‑training subsets and five downstream tasks, describes the R2D2 dual‑tower‑plus‑single‑tower pre‑training framework with multiple loss functions, and reports extensive experiments and real‑world deployment insights.

Multimodal AIR2D2 frameworkZero dataset

0 likes · 19 min read

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

Alibaba Cloud Big Data AI Platform

Sep 21, 2022 · Artificial Intelligence

Unlocking PEGASUS: How EasyNLP Simplifies Text Summarization with Pre‑Training

This article explains the importance of text generation, introduces the PEGASUS model’s gap‑sentence pre‑training for abstractive summarization, and shows how the EasyNLP framework integrates PEGASUS and other Chinese and English summarization models with step‑by‑step installation, data preparation, and training commands.

EasyNLPNLPPEGASUS

0 likes · 22 min read

Unlocking PEGASUS: How EasyNLP Simplifies Text Summarization with Pre‑Training

DataFunSummit

Jul 18, 2022 · Artificial Intelligence

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

This talk presents recent year’s research on natural language generation, covering the ProphetNet pre‑trained generation model, external‑knowledge integration for generation, non‑autoregressive pre‑training (BANG), the Poolingformer long‑text architecture, EL‑attention for faster decoding, and a new multi‑task generation benchmark.

Efficient AttentionKnowledge Integrationlong‑text modeling

0 likes · 22 min read

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

DataFunTalk

Jun 30, 2022 · Artificial Intelligence

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

The OPPO XiaoBu team introduced OBERT, a series of 100M‑, 300M‑, and 1B‑parameter pretrained language models that leverage massive TB‑scale corpora, multi‑granular masking, retrieval‑augmented training, and distributed acceleration to achieve state‑of‑the‑art results on CLUE and KgCLUE benchmarks while enabling efficient industrial deployment.

Knowledge augmentationNLPfine-tuning

0 likes · 12 min read

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

DataFunSummit

Jun 25, 2022 · Artificial Intelligence

Image and Text Pretraining: Methods, Practices, and Business Applications in Information Flow

This article reviews large‑scale image and multimodal pre‑training techniques—including contrastive learning, self‑supervised reconstruction, and multimodal alignment—explains data acquisition, model construction, evaluation metrics, and demonstrates how these methods are applied and optimized for real‑world information‑flow services.

AIInformation Flowcontrastive learning

0 likes · 17 min read

Image and Text Pretraining: Methods, Practices, and Business Applications in Information Flow

Alimama Tech

Jun 15, 2022 · Artificial Intelligence

Multi-modal Multi-query Search Session Modeling with Heterogeneous Graph Neural Networks

The paper introduces MUVCOG, a heterogeneous graph neural network that models multi‑modal, multi‑query search sessions on Mobile Taobao by jointly learning attention‑based global and hierarchical local views through contrastive pre‑training, yielding universal session embeddings that markedly improve CTR prediction, query recommendation, and intent classification.

Graph Neural NetworkMulti-modalcontrastive learning

0 likes · 15 min read

Multi-modal Multi-query Search Session Modeling with Heterogeneous Graph Neural Networks

DataFunTalk

Jun 8, 2022 · Artificial Intelligence

Integrating Knowledge Graphs with Neural Networks: Generative Pre‑Training, Differentiable Reasoning, and Fuzzy Logic Query Embedding

This article reviews recent advances in combining knowledge graphs with neural networks, covering generative pre‑training of graph neural networks, wiki‑graph based open‑domain question answering, differentiable logical reasoning, and a fuzzy‑logic query‑embedding model that improves performance on sparse‑relation queries.

Artificial IntelligenceOpen Domain QAfuzzy logic

0 likes · 23 min read

Integrating Knowledge Graphs with Neural Networks: Generative Pre‑Training, Differentiable Reasoning, and Fuzzy Logic Query Embedding

DaTaobao Tech

May 27, 2022 · Artificial Intelligence

Multimodal Pretraining for Search Recall in E-commerce

The paper proposes a multimodal pre‑training framework that jointly encodes query text and item titles with images via shared and single‑stream towers, using MLM, MPM, QIC, and matching tasks, and demonstrates substantial Recall@K gains on a billion‑item e‑commerce catalog by leveraging visual cues to bridge the semantic gap.

MultimodalVector Retrievale-commerce

0 likes · 17 min read

Multimodal Pretraining for Search Recall in E-commerce

DataFunSummit

Feb 22, 2022 · Artificial Intelligence

Graph Pretraining Techniques for Molecular Representation and Their Applications in Drug Discovery

This article reviews the motivation, methods, and results of graph-based self‑supervised pretraining for molecular data, introduces the ChemRL‑GEM model that incorporates 3‑D structural information, and demonstrates its superior performance on ADMET, affinity prediction, and benchmark competitions using the PaddleHelix platform.

AIChemistryGraph Neural Networks

0 likes · 18 min read

Graph Pretraining Techniques for Molecular Representation and Their Applications in Drug Discovery

Baobao Algorithm Notes

Jan 28, 2022 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV

This article traces the history of deep‑learning pre‑training techniques, comparing the parallel developments in natural‑language processing and computer vision—from early word2vec and bag‑of‑words models through ELMo and BERT to recent transformer‑based vision models like iGPT, ViT, BEiT and MAE—highlighting key innovations, challenges, and the convergence of the two fields.

Deep LearningMAENLP

0 likes · 20 min read

How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV

Baobao Algorithm Notes

Dec 23, 2021 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision

This article traces the evolution of deep‑learning pre‑training techniques, starting with word2vec in NLP, moving through ELMo and BERT, then shifting to computer‑vision models such as iGPT, ViT, BEiT, and MAE, highlighting key innovations, challenges, and the convergence of NLP and CV paradigms.

BERTMAENLP

0 likes · 21 min read

How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision

DataFunSummit

Dec 11, 2021 · Artificial Intelligence

Survey of User Representation Learning and Transfer Learning in Recommendation Systems

This article reviews recent advances in user representation learning for recommender systems, covering self‑supervised pre‑training, lifelong learning, multi‑task modeling, and large‑scale contrastive methods, and provides code and dataset links for key papers such as PeterRec, Conure, DUPN, ShopperBERT, PTUM, UPRec, and LURM.

Recommendation Systemspretrainingself-supervised learning

0 likes · 11 min read

Survey of User Representation Learning and Transfer Learning in Recommendation Systems

Meituan Technology Team

Dec 2, 2021 · Artificial Intelligence

Pretraining Techniques for Search Advertising Relevance at Meituan

Meituan improves search‑ad relevance by applying pre‑trained BERT models enhanced with data‑augmented samples, multi‑task learning, keyword extraction and two‑stage knowledge distillation, producing a lightweight distilled model that, when fused with traditional relevance signals, boosts CTR, lowers Badcase@5 and raises NDCG while preserving revenue.

BERTKnowledge DistillationSearch

0 likes · 30 min read

Pretraining Techniques for Search Advertising Relevance at Meituan

JD Retail Technology

Nov 16, 2021 · Artificial Intelligence

Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations

The APCG system, awarded the AAAI 2022 Innovation Application Prize, automatically generates e‑commerce product copy using a Transformer‑Pointer network and a pretrained sequence‑to‑sequence model, incorporates quality control, employs novel pretraining tasks, and has produced millions of descriptions that boost CTR, CVR, and GMV.

AINatural Language GenerationTransformer

0 likes · 6 min read

Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations

DataFunSummit

Nov 14, 2021 · Artificial Intelligence

Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing

This article introduces the importance of pre‑training in natural language processing, reviews classic pre‑training models such as Skip‑thoughts, BERT, GPT‑2 and T5, presents the modular UER‑py framework and its Chinese resources, compares it with Huggingface Transformers, and outlines practical deployment steps in industry.

Language ModelsNLPUER-py

0 likes · 22 min read

Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing

DataFunTalk

Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

This article presents a comprehensive overview of how AI technologies are applied to credit reporting and loan risk modeling, detailing data characteristics, end‑to‑end model architectures, pre‑training strategies, risk‑ranking methods, and interpretability techniques for financial risk assessment.

AIModel OptimizationRisk Ranking

0 likes · 17 min read

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

Alimama Tech

May 27, 2021 · Artificial Intelligence

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)

PCF‑GNN builds a heterogeneous graph of feature nodes and learns edge statistics via pre‑training, enabling it to infer unseen cross‑features, reduce storage by over 50%, and consistently improve CTR prediction AUC compared to implicit and explicit baselines, with proven online gains.

Graph Neural NetworkRecommendation Systemscross feature

0 likes · 12 min read

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)

DataFunTalk

May 8, 2021 · Artificial Intelligence

Attribute‑Level Sentiment Analysis for E‑commerce: Tasks, Challenges, and System Design

This article presents a comprehensive overview of sentiment analysis in user‑generated content, detailing document‑, sentence‑, and aspect‑level tasks, defining the Aspect Sentiment Triplet Extraction problem for e‑commerce reviews, describing a three‑stage pipeline with pre‑training, multi‑domain modeling and attribute normalization, and reporting significant business improvements such as 400% CTR lift, while also discussing data imbalance, annotation scarcity, and future research directions.

Sentiment Analysisaspect based sentimente-commerce

0 likes · 15 min read

Attribute‑Level Sentiment Analysis for E‑commerce: Tasks, Challenges, and System Design