Tagged articles
116 articles
Page 1 of 2
SuanNi
SuanNi
May 20, 2026 · Industry Insights

Why Karpathy’s Sudden Move to Anthropic Could Shift the AI IPO Landscape

Andrej Karpathy announced his return to frontline AI research by joining Anthropic just as both companies prepare for IPOs, a move that leverages his extensive background, reflects shifting LLM scaling priorities, and signals a strategic talent and technology win for Anthropic in the competitive AI market.

AI industryAI talentAndrej Karpathy
0 likes · 12 min read
Why Karpathy’s Sudden Move to Anthropic Could Shift the AI IPO Landscape
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

The article distinguishes two meanings of “end‑to‑end,” then outlines four sequential stages—defining data and scenario, massive pre‑training on audio‑text pairs, task alignment via instruction or supervised fine‑tuning, and optional preference tuning—to guide engineers in building usable speech assistants.

Speech AIaudio dataend-to-end models
0 likes · 6 min read
What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

What Pretraining Actually Teaches: Listening to All Sounds

The article explains that pretraining for speech models functions like a broad liberal‑arts education, teaching universal acoustic and linguistic patterns through next‑token prediction, joint audio‑text training, and mask‑or contrast objectives, while clarifying common misconceptions and highlighting data bias and the need for clean, task‑specific fine‑tuning.

Fine-tuningaudio-text alignmentdata bias
0 likes · 6 min read
What Pretraining Actually Teaches: Listening to All Sounds
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 17, 2026 · Artificial Intelligence

Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute

The article examines how the dramatic increase in GPU compute power—illustrated by a single H100 GPU equaling about 200 Hadoop instances—challenges the dominance of tree‑based models for structured data, presents scaling‑law experiments with KMLP and FOUND, and argues that pre‑training can redefine the balance between compute, data, and algorithms.

FOUNDGPUKMLP
0 likes · 10 min read
Can Table Modeling Scale? Rethinking Tree Models in the Age of Massive Compute
Machine Heart
Machine Heart
Apr 17, 2026 · Artificial Intelligence

Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts

The article examines how a single NVIDIA H100 GPU delivers roughly 200‑fold more FP16 compute than a 96‑core CPU Hadoop node, explores the "Bitter Lesson" of scaling‑driven AI breakthroughs, and presents large‑scale pretraining experiments that show table and sequence models now exhibit clear scaling laws, challenging the dominance of traditional tree‑based approaches.

FOUNDKMLPStructured Data
0 likes · 10 min read
Can Table Modeling Scale? Rethinking the Tree Model Era Amid Compute Shifts
Lao Guo's Learning Space
Lao Guo's Learning Space
Apr 2, 2026 · Artificial Intelligence

Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution

This article explains the full lifecycle of large language models in 2026, covering pretraining fundamentals, the limits of classic Scaling Laws, data‑centric advances, fine‑tuning strategies, RLHF, DPO, and the emerging post‑training methods GRPO, DAPO and RLVR, with concrete benchmarks and cost analyses.

DAPODPOFine-tuning
0 likes · 17 min read
Large Model Pretraining and Fine‑Tuning: A 2026 Technical Guide from Scaling Laws to Post‑Training Revolution
AI Large-Model Wave and Transformation Guide
AI Large-Model Wave and Transformation Guide
Mar 28, 2026 · Artificial Intelligence

How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF

This guide breaks down the four major large‑model training paradigms—pre‑training, supervised fine‑tuning, preference alignment, and RLHF—explaining which parameters are updated, how attention is reshaped, and what capabilities are gained, so you can deliver a structured, interview‑ready answer.

AI InterviewFine-tuningLLM
0 likes · 8 min read
How to Ace LLM Interview Questions: Deep Dive into Pre‑training, SFT, DPO & RLHF
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 15, 2026 · Artificial Intelligence

630‑Line Autoresearch Generates 81 Agents, 2,300 Experiments and Ten Pre‑training Insights

A 630‑line Python Autoresearch project sparked a community‑run distributed system that created over 80 autonomous AI agents, executed more than 2,300 experiments in four days, self‑organized roles and peer‑review, and uncovered ten concrete pre‑training findings.

AI agentsAutoResearchDistributed Training
0 likes · 9 min read
630‑Line Autoresearch Generates 81 Agents, 2,300 Experiments and Ten Pre‑training Insights
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 5, 2026 · Artificial Intelligence

Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI

Zitong Yang’s Stanford PhD defense introduced “continually self‑improving AI,” a system that autonomously refines its own parameters, generates synthetic training data, and even designs its own learning algorithms, with experiments on synthetic continual training, synthetic‑bootstrap pre‑training, and AI‑design‑AI demonstrating measurable gains over static baselines.

AI researchcontinual learningpretraining
0 likes · 35 min read
Can AI Self‑Improve? Inside a Stanford PhD Defense on Continually Self‑Improving AI
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Mar 3, 2026 · Artificial Intelligence

How HORAI Uses Large‑Scale Multimodal Pretraining to Boost Time‑Series Forecasting and Anomaly Detection

The article reviews the HORAI model, which introduces a frequency‑enhanced multimodal pretraining paradigm and the massive MM‑TS dataset, showing that integrating derived images, endogenous text, and real‑world news dramatically improves zero‑shot forecasting and anomaly detection across six domains.

HORAIMultimodal LearningTime Series
0 likes · 23 min read
How HORAI Uses Large‑Scale Multimodal Pretraining to Boost Time‑Series Forecasting and Anomaly Detection
PaperAgent
PaperAgent
Jan 19, 2026 · Artificial Intelligence

How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions

Recent research shows that applying reinforcement learning to large language models can dramatically improve inference performance, but its effectiveness depends on the token distribution produced during pre‑training, prompting a novel rewrite of cross‑entropy as a single‑step policy gradient with controllable entropy parameters.

LLMModel OptimizationRL
0 likes · 6 min read
How Reinforcement Learning Can Boost LLM Reasoning by Shaping Token Distributions
JD Tech
JD Tech
Jan 13, 2026 · Artificial Intelligence

Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained

This extensive guide walks readers through the fundamentals of large language models, covering transformer architecture, pre‑training and fine‑tuning techniques, scaling laws, emergent abilities, mixture‑of‑experts designs, and practical comparisons, providing clear explanations, code snippets, and visual illustrations for deep learning practitioners.

Fine-tuningMixture of Expertsemergent abilities
0 likes · 47 min read
Mastering Large Language Models: Transformers, Scaling Laws, and MoE Explained
Frontend AI Walk
Frontend AI Walk
Dec 2, 2025 · Artificial Intelligence

Understanding LLMs: A Frontend Developer’s Primer on Large Language Models

The article demystifies large language models for frontend developers by likening token prediction to autocomplete, explaining tokens, context windows, temperature, the two-stage training process, and the critical role of prompts, using concrete code examples and analogies to familiar frontend concepts.

Fine-tuningFrontend AnalogyLLM
0 likes · 10 min read
Understanding LLMs: A Frontend Developer’s Primer on Large Language Models
Data Party THU
Data Party THU
Nov 16, 2025 · Artificial Intelligence

How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model

The X‑VLA paper introduces a 0.9‑billion‑parameter, fully open‑source embodied model that uses a learnable soft‑prompt and divide‑and‑conquer encoding to handle heterogeneous robot vision inputs, achieving a record‑breaking 120‑minute autonomous clothing‑folding task while surpassing benchmarks across five simulation environments.

Embodied AIMultimodal LearningRobotics
0 likes · 7 min read
How X‑VLA Enables 120‑Minute Unassisted Robot Clothing Folding with a 0.9B Model
HyperAI Super Neural
HyperAI Super Neural
Nov 15, 2025 · Artificial Intelligence

AI Paper Weekly: Scale Pretraining, Game Agents, Attention, Context Engineering

This weekly roundup highlights five recent AI research papers—including CoCa’s contrastive captioning model, the Game‑TARS framework for scalable game agents, Kimi Linear’s efficient attention architecture, the Continuous Autoregressive Language Model (CALM), and a comprehensive survey of Context Engineering—summarizing their core contributions and providing direct links.

AIContext Engineeringattention architecture
0 likes · 6 min read
AI Paper Weekly: Scale Pretraining, Game Agents, Attention, Context Engineering
Ctrip Technology
Ctrip Technology
Nov 6, 2025 · Artificial Intelligence

How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting

TripCast introduces a masked 2D transformer pre‑training framework that treats travel demand as a two‑dimensional time‑series problem, leveraging time‑patch tokenization, dual masking and RevIN normalization to achieve state‑of‑the‑art forecasting performance on massive real‑world travel data.

2D transformerArtificial Intelligencemasked transformer
0 likes · 7 min read
How TripCast Uses Masked 2D Transformers to Revolutionize Travel Time-Series Forecasting
Ele.me Technology
Ele.me Technology
Oct 27, 2025 · Artificial Intelligence

How IAK Transforms Multi‑Domain Recommendation with Pre‑Training and Fine‑Tuning

This paper introduces IAK, a unified multi‑domain recommendation paradigm that treats the system as a large model, leveraging pre‑training and fine‑tuning with an information‑aware adaptive kernel to capture rapid user interest shifts while reducing training costs and improving online performance.

Large Language ModelsRecommendation Systemsfine‑tuning
0 likes · 18 min read
How IAK Transforms Multi‑Domain Recommendation with Pre‑Training and Fine‑Tuning
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Oct 22, 2025 · Artificial Intelligence

Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment

This guide walks through the complete end‑to‑end process of training a large language model from scratch, covering data collection, cleaning, tokenization, pre‑training objectives and engineering, post‑training alignment methods, scaling laws, over‑fitting mitigation, and gradient‑stability techniques.

AlignmentLLMgradient stability
0 likes · 9 min read
Mastering LLM Training: A Step‑by‑Step Blueprint from Data to Alignment
Data Party THU
Data Party THU
Sep 26, 2025 · Artificial Intelligence

How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding

Keye‑VL‑1.5, an 8‑billion‑parameter multimodal large language model, introduces a Slow‑Fast video encoding strategy, a four‑stage progressive pre‑training pipeline with 128K context, and a sophisticated post‑training regime that together achieve state‑of‑the‑art performance on video and vision‑language benchmarks while maintaining strong general capabilities.

Multimodal LLMbenchmarklarge language model
0 likes · 21 min read
How Keye‑VL‑1.5 Redefines Video Understanding with Slow‑Fast Encoding
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Aug 28, 2025 · Artificial Intelligence

Key AI-Driven Quantitative Finance Papers from KDD2025

This article summarizes recent AI research on quantitative finance, covering AlphaAgent's LLM-driven alpha mining, UMI's multi‑level irrationality factors, PDU's progressive dependency learning for stock ranking, SSPT's stock‑specific pretraining transformer, and Enhancer's distribution‑aware meta‑learning framework, all of which demonstrate improved stock prediction and resistance to alpha decay.

Alpha MiningFinancial AILLM
0 likes · 9 min read
Key AI-Driven Quantitative Finance Papers from KDD2025
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Aug 26, 2025 · Artificial Intelligence

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.

Financial AITime SeriesTransformer
0 likes · 11 min read
SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance
Data Party THU
Data Party THU
Aug 20, 2025 · Artificial Intelligence

How Large-Scale Corpus Rewriting is Shaping LLM Training: A Deep Dive into K2, WRAP, and Beyond

This article surveys recent large‑scale corpus rewriting techniques for LLM pre‑training, covering K2’s token‑utilization strategies, domain‑specific methods like SwallowMath/Code, reStructured pretraining, the WRAP pipeline, Nemotron‑CC filtering, Pro‑X noise removal, and the MAGA multi‑style expansion, while highlighting challenges, experimental findings, and open research questions.

LLMcorpus rewritingdata synthesis
0 likes · 20 min read
How Large-Scale Corpus Rewriting is Shaping LLM Training: A Deep Dive into K2, WRAP, and Beyond
Amap Tech
Amap Tech
Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsImage GenerationVAE
0 likes · 13 min read
Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding
Amap Tech
Amap Tech
Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Diffusion ModelsImage Generationpretraining
0 likes · 10 min read
Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding
Data Thinking Notes
Data Thinking Notes
Jun 2, 2025 · Artificial Intelligence

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Pre‑training enables AI models to first acquire a universal knowledge map from massive unlabelled text, then quickly adapt to specific tasks with minimal labelled data, offering superior generalization, reduced annotation costs, and versatile applications across chatbots, content creation, retrieval, coding assistance, and more.

AI applicationsLarge Language ModelsTransformer
0 likes · 14 min read
Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications
Tencent Technical Engineering
Tencent Technical Engineering
May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMModel architecture
0 likes · 25 min read
Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture
AIWalker
AIWalker
May 6, 2025 · Artificial Intelligence

SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL

SimpleAR demonstrates that a vanilla autoregressive model with only 0.5 B parameters can generate high‑fidelity 1024×1024 images, covering pretraining, supervised fine‑tuning, and reinforcement learning, achieving competitive GenEval (0.59) and DPG‑Bench (79.66) scores while reducing inference time to about 14 seconds with vLLM and KV‑cache optimizations.

Reinforcement LearningSupervised Fine‑Tuningautoregressive
0 likes · 14 min read
SimpleAR: High‑Quality 1024×1024 Images with Just 0.5B Parameters via Pretraining, SFT, and RL
AIWalker
AIWalker
Apr 28, 2025 · Artificial Intelligence

SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters

SimpleAR is a minimalist autoregressive visual generation framework that, with only 0.5 B parameters, achieves competitive 1024×1024 image synthesis through a three‑stage pipeline of large‑scale pretraining, supervised fine‑tuning, and GRPO‑based reinforcement learning, and demonstrates significant inference speedups using KV‑cache, vLLM, and speculative decoding.

Inference AccelerationReinforcement Learningautoregressive generation
0 likes · 14 min read
SimpleAR: Autoregressive Visual Generation at 1024×1024 Using Only 0.5B Parameters
Alimama Tech
Alimama Tech
Apr 23, 2025 · Artificial Intelligence

Distribution-aware Graph Prompt Tuning (DAGPrompT) for Heterophilic Graphs

Distribution‑aware Graph Prompt Tuning (DAGPrompT) tackles the pre‑training/downstream mismatch on heterophilic graphs by jointly applying low‑rank GLoRA adaptation and hop‑specific prompts that recast tasks as link‑prediction, yielding up to 4.79% accuracy gains and an average 2.43% improvement in few‑shot node classification.

Few‑Shot LearningPrompt Tuningdistribution-aware
0 likes · 9 min read
Distribution-aware Graph Prompt Tuning (DAGPrompT) for Heterophilic Graphs
Cognitive Technology Team
Cognitive Technology Team
Mar 22, 2025 · Artificial Intelligence

Three Stages of Developing Large Language Models and Practical Guidance

The article outlines the three development phases of large language models—building, pre‑training, and fine‑tuning—describes usage options, highlights key factors such as data scale, architecture, training processes, and evaluation, and offers practical advice for cost‑effective development.

Fine-tuningLLMModel Development
0 likes · 3 min read
Three Stages of Developing Large Language Models and Practical Guidance
JD Tech Talk
JD Tech Talk
Mar 5, 2025 · Artificial Intelligence

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

GLM introduces a unified pretraining framework that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle, achieving superior performance over BERT, T5 and GPT on a range of NLU and generation tasks such as SuperGLUE, text‑filling, and language modeling.

2D positional encodingGLMLanguage Model
0 likes · 27 min read
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
JD Cloud Developers
JD Cloud Developers
Mar 5, 2025 · Artificial Intelligence

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

GLM introduces a universal language model that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle training, achieving superior performance over BERT, T5, and GPT across NLU, conditional and unconditional generation tasks, as demonstrated on SuperGLUE and other benchmarks.

Language ModelNLUTransformer
0 likes · 29 min read
How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT
Architect
Architect
Feb 11, 2025 · Artificial Intelligence

DeepSeek: Training Process, Working Principles, and Recent Innovations

The article explains DeepSeek's two‑stage training pipeline—including massive pre‑training on trillions of tokens and post‑training via instruction tuning and reinforcement learning from human feedback—describes the differences between its V3 instruction model and R1 reasoning model, and highlights performance optimizations and emerging research directions.

AIDeepSeekInstruction Tuning
0 likes · 8 min read
DeepSeek: Training Process, Working Principles, and Recent Innovations
DataFunSummit
DataFunSummit
Feb 5, 2025 · Artificial Intelligence

Exploration and Practice of Large‑Model Data Construction

This presentation details engineering‑focused approaches to building, mixing, and filtering data for large language models, covering data preparation, pre‑training mix strategies such as DoReMi, DoGE and online sampling, post‑training data quality selection methods, and practical Q&A on scaling laws and PDF processing.

AIData MixingModel Scaling
0 likes · 15 min read
Exploration and Practice of Large‑Model Data Construction
Baidu Geek Talk
Baidu Geek Talk
Dec 25, 2024 · Industry Insights

How to Build a Multimodal Web Page Model for the LLM Era

This article examines the unique multimodal and multi‑granular nature of web pages, compares fusion strategies, proposes a cross‑modal attention approach, outlines fine‑ and coarse‑grained pre‑training tasks, and explores low‑cost adaptor methods for adapting large multimodal models to web‑page modeling in the LLM era.

AIHTMLLLM adaptation
0 likes · 10 min read
How to Build a Multimodal Web Page Model for the LLM Era
NewBeeNLP
NewBeeNLP
Dec 23, 2024 · Artificial Intelligence

What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances

The Qwen2.5 Technical Report introduces a new series of large language models with up to 72 B parameters, expanded pre‑training data to 18 trillion tokens, advanced supervised fine‑tuning and reinforcement learning pipelines, and demonstrates strong performance across comprehension, reasoning, coding, and long‑context tasks.

Fine-tuningLLMQwen2.5
0 likes · 5 min read
What’s New in Qwen2.5? A Deep Dive into the Latest LLM Advances
ZhongAn Tech Team
ZhongAn Tech Team
Dec 22, 2024 · Industry Insights

What’s Driving the AI Boom? New Models, Data Limits, and the Rise of Forgetting

This issue reviews the latest AI breakthroughs—including OpenAI’s O3 and o1 models, pricing cuts, new features in ChatGPT, product launches like Pika 2.0 and Gemini 2.0, a heated debate on pre‑training data bottlenecks sparked by Ilya Sutskever, a novel black‑box forgetting method, and DeepMind’s Genie 2 3D world generator—highlighting how industry dynamics and research directions are reshaping the field.

3D generationAIModel Forgetting
0 likes · 12 min read
What’s Driving the AI Boom? New Models, Data Limits, and the Rise of Forgetting
DevOps
DevOps
Dec 8, 2024 · Artificial Intelligence

Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications

This article explains fine‑tuning in machine learning, covering its definition, why it matters, the role of pre‑trained models, detailed step‑by‑step procedures, advantages, and diverse applications such as NLP, computer vision, speech and finance, with practical examples like face recognition and object detection.

AI applicationsFine-tuningModel Optimization
0 likes · 16 min read
Understanding Fine-Tuning in Machine Learning: Concepts, Importance, Steps, and Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence

How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned

This article details the end‑to‑end process of pre‑training, fine‑tuning, and evaluating a 1‑billion‑parameter Chinese LLM named Steel‑LLM on limited hardware, covering data collection, pipeline design, training framework choices, architectural tweaks, performance results, and practical lessons for resource‑constrained developers.

LLMModel architectureTraining Optimization
0 likes · 18 min read
How I Built a 1B‑Parameter Chinese LLM on a Single A100: Lessons Learned
Bilibili Tech
Bilibili Tech
Nov 5, 2024 · Artificial Intelligence

Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations

Bilibili’s in‑house role‑playing large language model, built on the Index architecture and refined through pre‑training, supervised fine‑tuning, and preference optimization (PPO and DPO), achieved top scores on the Chinese CharacterEval benchmark, surpassing rivals while incorporating safety alignment and showcasing consistent, personality‑driven dialogue examples.

Content SafetyPreference OptimizationSupervised Fine‑Tuning
0 likes · 13 min read
Bilibili's In-House Role-Playing Large Language Model: Architecture, Training Stages, Evaluation, and Demonstrations
Infra Learning Club
Infra Learning Club
Oct 30, 2024 · Artificial Intelligence

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

The article traces the development of GPT series—from the 2017 Transformer breakthrough, through GPT‑1, GPT‑2, and GPT‑3’s 175 billion parameters, to later models like Codex and ChatGPT—highlighting key papers, architectural choices, and the surprising role of OpenAI’s decoder‑only approach.

GPT-3GoogleLanguage Model
0 likes · 4 min read
How GPT-3 Evolved: From Transformer Roots to Massive Language Models
NewBeeNLP
NewBeeNLP
Oct 11, 2024 · Artificial Intelligence

Inside Llama 3: Training, Architecture, and Performance Secrets

An extensive review of Meta’s Llama 3 model breaks down its pre‑training data pipeline, scaling laws, architectural tweaks like GQA and RoPE, post‑training methods such as SFT, DPO, and reward modeling, and evaluates benchmark results, offering practical insights for researchers and engineers building large language models.

BenchmarkingLarge Language ModelsLlama 3
0 likes · 32 min read
Inside Llama 3: Training, Architecture, and Performance Secrets
Bilibili Tech
Bilibili Tech
Sep 18, 2024 · Artificial Intelligence

Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities

Index-1.9B-32K is a 1.9B-parameter model with a 32K token context window, achieving strong long‑text performance comparable to larger models while using only about 2% of GPT‑4’s compute, trained via long pre‑training and supervised fine‑tuning, with a trade‑off of reduced short‑context ability.

AIFine-tuningevaluation
0 likes · 12 min read
Index-1.9B-32K: A 2% GPT-Size Model with Powerful Long-Context Capabilities
NewBeeNLP
NewBeeNLP
Sep 3, 2024 · Industry Insights

Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams

The answer explains that joining a pre‑training team accelerates a newcomer’s engineering abilities through hands‑on work with large‑scale data pipelines, distributed training code, and debugging, while SFT teams focus mainly on data labeling, making pre‑training the more effective path for rapid skill growth.

AIEngineering SkillsSFT
0 likes · 6 min read
Why Pre‑training Teams Boost New Engineers’ Skills Faster Than SFT Teams
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Data Qualityinstruction fine-tuningpretraining
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 29, 2024 · Industry Insights

Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide

The answer argues that fresh graduates should join pre‑training teams because the required engineering tasks—large‑scale data crawling, Hadoop/Spark pipelines, torch and CUDA setup, megatron code debugging, and scaling‑law experiments—rapidly sharpen coding skills, while SFT work focuses mainly on data labeling and offers slower technical growth.

AI EngineeringSFTSkill development
0 likes · 7 min read
Why Pretraining Boosts New Engineers More Than SFT: A Practical Guide
DataFunTalk
DataFunTalk
Aug 5, 2024 · Artificial Intelligence

Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches, and Insights

This article presents a comprehensive study on integrating multimodal image‑text representations into large‑scale e‑commerce advertising CTR models, introducing a semantic‑aware contrastive pre‑training (SCL) method and two application algorithms (SimTier and MAKE) that together achieve over 1 % GAUC improvement and significant online gains.

CTR predictionRecommendation Systemscontrastive learning
0 likes · 21 min read
Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches, and Insights
Baobao Algorithm Notes
Baobao Algorithm Notes
Jul 25, 2024 · Artificial Intelligence

Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact

The article provides an in‑depth analysis of LLaMA 3 405B, covering its dense Transformer architecture, three‑stage pre‑training (initial, long‑context, annealing), iterative post‑training with RM‑guided rejection sampling, the decision against MOE, and the broader implications for both large and small model development.

405BModel architecturemodel distillation
0 likes · 17 min read
Why LLaMA 3 405B Matches GPT‑4o: Architecture, Training, and Industry Impact
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Jul 6, 2024 · Artificial Intelligence

ChatGLM Evolution: Deep Dive into GLM Architecture, Pretraining, and ChatGLM‑4

This article provides a comprehensive technical overview of the ChatGLM series—from the original ChatGLM‑6B model and its GLM‑based pre‑training framework to the enhancements in ChatGLM‑2, the architectural parity of ChatGLM‑3, and the advanced capabilities of the latest ChatGLM‑4, covering model structure, position encoding, attention mechanisms, multi‑task pretraining, and tool integration.

AIChatGLMGLM
0 likes · 25 min read
ChatGLM Evolution: Deep Dive into GLM Architecture, Pretraining, and ChatGLM‑4
Bilibili Tech
Bilibili Tech
Jun 14, 2024 · Artificial Intelligence

Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments

The report presents the open‑source Index‑1.9B family—base, pure, chat, and character variants—detailing benchmark results, pre‑training optimizations such as a normalized LM‑Head and deeper‑slim architectures, the importance of modest instruction data, alignment via SFT/DPO, role‑play enhancements with RAG, and acknowledges remaining safety and factual limitations.

AlignmentInstruction TuningLLM
0 likes · 15 min read
Technical Report on the Index-1.9B Series: Model Variants, Pre‑training Optimizations, and Alignment Experiments
NewBeeNLP
NewBeeNLP
May 31, 2024 · Artificial Intelligence

Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?

This article analyzes whether large‑scale web crawls, when meticulously filtered and deduplicated, can match or surpass the performance of high‑quality curated datasets in training large language models, covering dataset composition, processing pipelines, experimental results, scaling‑law implications, and future data‑efficiency strategies.

Artificial IntelligenceDataset CleaningLLM
0 likes · 23 min read
Can Cleaned Web Data Rival Proprietary Corpora for LLM Training?
DataFunTalk
DataFunTalk
May 15, 2024 · Artificial Intelligence

Advances in Video Multimodal Retrieval: Video‑Text Semantic Search and Video‑Video Same‑Source Search

This article presents Ant Group's multimodal research on video retrieval, detailing video‑text semantic search and video‑video same‑source search, introducing a large Chinese pre‑training dataset, novel pre‑training, hard‑sample mining, fine‑grained modeling techniques, and an efficient end‑to‑end copyright detection framework.

Multimodal AIcopyright detectionfine-grained modeling
0 likes · 38 min read
Advances in Video Multimodal Retrieval: Video‑Text Semantic Search and Video‑Video Same‑Source Search
DataFunSummit
DataFunSummit
Apr 24, 2024 · Artificial Intelligence

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

This article presents Baidu's exploration of multimodal content understanding for commercial advertising, detailing the ViCAN pre‑training model, its contrastive and mask‑language learning tasks, integration across recall, ranking and risk‑control pipelines, quantization with MMDict, and future AIGC‑driven generation, all backed by extensive experiments and Q&A.

AIAIGCAdvertising
0 likes · 27 min read
Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications
DataFunSummit
DataFunSummit
Mar 27, 2024 · Artificial Intelligence

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

This article reviews Tongyi Lab's work on the OFA framework for generative multimodal pretraining and the ONE-PEACE model for unified multimodal representation learning, detailing their architectures, training strategies, experimental results across vision‑language and audio tasks, and future research directions.

MultimodalOFAONE-PEACE
0 likes · 15 min read
Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings
NewBeeNLP
NewBeeNLP
Mar 27, 2024 · Artificial Intelligence

Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights

This article provides a comprehensive technical overview of Meta's Llama 2 series, covering its architectural upgrades such as Group Query Attention, the pre‑training dataset and hyper‑parameters, loss behavior, benchmark comparisons, and the supervised fine‑tuning pipeline with safety considerations.

AILlama-2Model architecture
0 likes · 11 min read
Deep Dive into Llama 2: Architecture, Pre‑training, SFT, and Safety Insights
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

Multimodalefficient-aifeature fusion
0 likes · 19 min read
How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 21, 2024 · Artificial Intelligence

Understanding Pretraining and Fine‑Tuning of Large Language Models: Methods, Resources, and Practical Applications

This article explains the concepts of pretraining and fine‑tuning for large language models, compares full‑parameter, LoRA and QLoRA approaches, discusses resource consumption, introduces the ModelScope SWIFT framework with code examples, and shows how fine‑tuning can improve data‑visualisation tasks while reducing token usage.

Data visualizationLLMLoRA
0 likes · 24 min read
Understanding Pretraining and Fine‑Tuning of Large Language Models: Methods, Resources, and Practical Applications
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 4, 2023 · Artificial Intelligence

An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications

This article provides a comprehensive English overview of BERT, covering its original paper, model architecture, pre‑training objectives (Masked Language Model and Next Sentence Prediction), differences from ELMo, GPT and vanilla Transformers, parameter counts, main contributions, and a range of NLP application scenarios such as text classification, sentiment analysis, NER, and machine translation.

BERTNLPNext Sentence Prediction
0 likes · 16 min read
An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 26, 2023 · Artificial Intelligence

Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications

This article provides a comprehensive overview of Google's T5 model, detailing its unified text‑to‑text formulation, encoder‑decoder architecture, three model variants, attention mask designs, training strategies, model sizes, experimental results, and key contributions to natural language processing.

Artificial IntelligenceNLPT5
0 likes · 14 min read
Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications
AntTech
AntTech
Nov 7, 2023 · Artificial Intelligence

Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning

The paper proposes a multi‑scale stochastic distribution prediction (MSDP) framework that learns robust user behavior representations by predicting behavior distributions over random time windows, incorporates contrastive regularization, and demonstrates superior performance on both proprietary financial risk data and a public e‑commerce dataset compared with existing masked and next‑behavior pre‑training methods.

AIMulti-Scaledistribution prediction
0 likes · 13 min read
Multi‑Scale Stochastic Distribution Prediction for User Behavior Representation Learning
DataFunSummit
DataFunSummit
Oct 8, 2023 · Artificial Intelligence

NLP Techniques for Financial Risk Control: Text Modeling, Non‑Text Modeling, Long‑Text Handling, Multi‑Modal Fusion and Sample Optimization

This article presents a comprehensive overview of how natural language processing is applied to financial risk control, covering text and non‑text sequence modeling, tokenization strategies, transformer‑based long‑text architectures, multi‑modal fusion methods, pre‑training techniques and practical sample‑optimization approaches.

AINLPText Modeling
0 likes · 22 min read
NLP Techniques for Financial Risk Control: Text Modeling, Non‑Text Modeling, Long‑Text Handling, Multi‑Modal Fusion and Sample Optimization
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 30, 2023 · Artificial Intelligence

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

This article reviews the evolution of the GPT family from GPT‑1 to GPT‑3, comparing encoder‑decoder architectures, explaining the shift from supervised fine‑tuning to zero‑shot and few‑shot learning, and highlighting the architectural and training innovations that enabled large‑scale language models.

Fine-tuningGPTLLM
0 likes · 13 min read
ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)
Sohu Tech Products
Sohu Tech Products
Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

Attention MechanismBERTEncoder-Decoder
0 likes · 13 min read
Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview
DataFunSummit
DataFunSummit
Jun 28, 2023 · Artificial Intelligence

OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights

This article presents OPPO Research Institute's recent advances in large‑model AI, detailing the CHAOS pretrained model that topped the CLUE leaderboard, the knowledge‑enhanced training pipeline, and the GammaE model for multi‑hop reasoning over knowledge graphs, together with experimental results and practical training tips.

AI researchGammaELarge Language Models
0 likes · 20 min read
OPPO's CHAOS Pretrained Large Model and GammaE Knowledge‑Graph Multi‑hop Reasoning: Techniques and Insights
DataFunTalk
DataFunTalk
Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language ModelingLow-Resourcecontrastive learning
0 likes · 10 min read
Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications
Baidu Geek Talk
Baidu Geek Talk
Mar 13, 2023 · Artificial Intelligence

Recent Advances in Sparse and Dense Retrieval for Search Engines

The article surveys recent academic advances in both sparse inverted‑index and dense semantic retrieval for large‑scale search, highlighting key papers on decision frameworks, benchmarks, sparse lexical models, dual encoders, and hybrid techniques, while discussing challenges such as single‑vector limits and proposing multi‑view and hybrid solutions.

dense retrievalinformation retrievalpretraining
0 likes · 12 min read
Recent Advances in Sparse and Dense Retrieval for Search Engines
Tencent Advertising Technology
Tencent Advertising Technology
Mar 2, 2023 · Artificial Intelligence

Tencent's HunYuan‑NLP 1T Large‑Scale AI Model: Training Techniques, Optimization, and Real‑World Applications

This article details Tencent's development of the 1‑trillion‑parameter HunYuan‑NLP model, covering its MoE architecture, cost‑effective pre‑training strategies, distributed training framework, model compression toolkit, and successful deployment across advertising, gaming, and other Tencent services.

AI InfrastructureMixture of Expertslarge language model
0 likes · 17 min read
Tencent's HunYuan‑NLP 1T Large‑Scale AI Model: Training Techniques, Optimization, and Real‑World Applications
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 10, 2023 · Artificial Intelligence

Sentiment Classification and Topic Clustering for NetEase Cloud Music Comments

To boost NetEase Cloud Music’s comment handling, the authors combine active‑learning‑driven relabeling, domain‑specific MLM pretraining, contrastive‑learning‑based sample expansion, and multi‑task BERT sharing to raise sentiment‑classification precision and recall above 90 % and double moderation clean‑rate, while employing prompt‑generated story themes, IP‑focused classifiers, and hot‑word aggregation for effective short‑text topic clustering and scalable, theme‑aware distribution.

NLPSentiment Analysisactive learning
0 likes · 10 min read
Sentiment Classification and Topic Clustering for NetEase Cloud Music Comments
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Computer VisionDeep LearningModel architecture
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 11, 2022 · Artificial Intelligence

Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk

Prof. Qiu Xipeng’s talk highlighted how large language models can be offered as a service and efficiently adapted via in‑context learning, lightweight label‑tuning, and gradient‑free black‑box optimization, showcasing a unified asymmetric Transformer (CPT) that handles understanding, generation, ABSA and NER tasks while reducing resource demands.

Black-Box OptimizationLLMLanguage Model
0 likes · 15 min read
Language Model as a Service and Black‑Box Optimization: Insights from Prof. Qiu Xipeng’s Talk
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 19, 2022 · Artificial Intelligence

How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining

CKBERT, a Chinese knowledge‑enhanced BERT developed by Alibaba’s EasyNLP team, integrates external knowledge graphs and internal linguistic cues through novel pre‑training tasks, offers three model sizes compatible with HuggingFace and PAI, and demonstrates superior performance on CLUE and NER benchmarks while providing easy deployment on cloud platforms.

CKBERTChinese NLPEasyNLP
0 likes · 40 min read
How CKBERT Boosts Chinese NLP with Knowledge‑Enhanced Pretraining
DataFunTalk
DataFunTalk
Sep 24, 2022 · Artificial Intelligence

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

This article introduces the importance of image‑text cross‑modal representation, presents the Chinese Zero dataset with two pre‑training subsets and five downstream tasks, describes the R2D2 dual‑tower‑plus‑single‑tower pre‑training framework with multiple loss functions, and reports extensive experiments and real‑world deployment insights.

Cross-modalMultimodal AIR2D2 framework
0 likes · 19 min read
Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 21, 2022 · Artificial Intelligence

Unlocking PEGASUS: How EasyNLP Simplifies Text Summarization with Pre‑Training

This article explains the importance of text generation, introduces the PEGASUS model’s gap‑sentence pre‑training for abstractive summarization, and shows how the EasyNLP framework integrates PEGASUS and other Chinese and English summarization models with step‑by‑step installation, data preparation, and training commands.

EasyNLPNLPPEGASUS
0 likes · 22 min read
Unlocking PEGASUS: How EasyNLP Simplifies Text Summarization with Pre‑Training
DataFunSummit
DataFunSummit
Jul 18, 2022 · Artificial Intelligence

Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention

This talk presents recent year’s research on natural language generation, covering the ProphetNet pre‑trained generation model, external‑knowledge integration for generation, non‑autoregressive pre‑training (BANG), the Poolingformer long‑text architecture, EL‑attention for faster decoding, and a new multi‑task generation benchmark.

efficient attentionknowledge integrationlong‑text modeling
0 likes · 22 min read
Advances in Natural Language Generation: ProphetNet, Knowledge‑Enhanced Generation, Non‑Autoregressive Pre‑training, Long‑Text Modeling, and Efficient Attention
DataFunTalk
DataFunTalk
Jun 30, 2022 · Artificial Intelligence

OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications

The OPPO XiaoBu team introduced OBERT, a series of 100M‑, 300M‑, and 1B‑parameter pretrained language models that leverage massive TB‑scale corpora, multi‑granular masking, retrieval‑augmented training, and distributed acceleration to achieve state‑of‑the‑art results on CLUE and KgCLUE benchmarks while enabling efficient industrial deployment.

Fine-tuningKnowledge augmentationNLP
0 likes · 12 min read
OBERT: A Billion‑Parameter Pretrained Language Model for Large‑Scale NLP Applications
DataFunSummit
DataFunSummit
Jun 25, 2022 · Artificial Intelligence

Image and Text Pretraining: Methods, Practices, and Business Applications in Information Flow

This article reviews large‑scale image and multimodal pre‑training techniques—including contrastive learning, self‑supervised reconstruction, and multimodal alignment—explains data acquisition, model construction, evaluation metrics, and demonstrates how these methods are applied and optimized for real‑world information‑flow services.

AIInformation Flowcontrastive learning
0 likes · 17 min read
Image and Text Pretraining: Methods, Practices, and Business Applications in Information Flow
Alimama Tech
Alimama Tech
Jun 15, 2022 · Artificial Intelligence

Multi-modal Multi-query Search Session Modeling with Heterogeneous Graph Neural Networks

The paper introduces MUVCOG, a heterogeneous graph neural network that models multi‑modal, multi‑query search sessions on Mobile Taobao by jointly learning attention‑based global and hierarchical local views through contrastive pre‑training, yielding universal session embeddings that markedly improve CTR prediction, query recommendation, and intent classification.

Graph Neural Networkcontrastive learningmulti-modal
0 likes · 15 min read
Multi-modal Multi-query Search Session Modeling with Heterogeneous Graph Neural Networks
DataFunTalk
DataFunTalk
Jun 8, 2022 · Artificial Intelligence

Integrating Knowledge Graphs with Neural Networks: Generative Pre‑Training, Differentiable Reasoning, and Fuzzy Logic Query Embedding

This article reviews recent advances in combining knowledge graphs with neural networks, covering generative pre‑training of graph neural networks, wiki‑graph based open‑domain question answering, differentiable logical reasoning, and a fuzzy‑logic query‑embedding model that improves performance on sparse‑relation queries.

Artificial IntelligenceOpen Domain QAfuzzy logic
0 likes · 23 min read
Integrating Knowledge Graphs with Neural Networks: Generative Pre‑Training, Differentiable Reasoning, and Fuzzy Logic Query Embedding
DaTaobao Tech
DaTaobao Tech
May 27, 2022 · Artificial Intelligence

Multimodal Pretraining for Search Recall in E-commerce

The paper proposes a multimodal pre‑training framework that jointly encodes query text and item titles with images via shared and single‑stream towers, using MLM, MPM, QIC, and matching tasks, and demonstrates substantial Recall@K gains on a billion‑item e‑commerce catalog by leveraging visual cues to bridge the semantic gap.

MultimodalVector Retrievale‑commerce
0 likes · 17 min read
Multimodal Pretraining for Search Recall in E-commerce
DataFunSummit
DataFunSummit
Feb 22, 2022 · Artificial Intelligence

Graph Pretraining Techniques for Molecular Representation and Their Applications in Drug Discovery

This article reviews the motivation, methods, and results of graph-based self‑supervised pretraining for molecular data, introduces the ChemRL‑GEM model that incorporates 3‑D structural information, and demonstrates its superior performance on ADMET, affinity prediction, and benchmark competitions using the PaddleHelix platform.

AIChemistryMolecular Representation
0 likes · 18 min read
Graph Pretraining Techniques for Molecular Representation and Their Applications in Drug Discovery
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 28, 2022 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV

This article traces the history of deep‑learning pre‑training techniques, comparing the parallel developments in natural‑language processing and computer vision—from early word2vec and bag‑of‑words models through ELMo and BERT to recent transformer‑based vision models like iGPT, ViT, BEiT and MAE—highlighting key innovations, challenges, and the convergence of the two fields.

Deep LearningMAENLP
0 likes · 20 min read
How Pre‑Training Evolved: From word2vec to MAE Across NLP and CV
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 23, 2021 · Artificial Intelligence

How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision

This article traces the evolution of deep‑learning pre‑training techniques, starting with word2vec in NLP, moving through ELMo and BERT, then shifting to computer‑vision models such as iGPT, ViT, BEiT, and MAE, highlighting key innovations, challenges, and the convergence of NLP and CV paradigms.

BERTMAENLP
0 likes · 21 min read
How Pre‑Training Evolved: From word2vec to MAE Across NLP & Vision
DataFunSummit
DataFunSummit
Dec 11, 2021 · Artificial Intelligence

Survey of User Representation Learning and Transfer Learning in Recommendation Systems

This article reviews recent advances in user representation learning for recommender systems, covering self‑supervised pre‑training, lifelong learning, multi‑task modeling, and large‑scale contrastive methods, and provides code and dataset links for key papers such as PeterRec, Conure, DUPN, ShopperBERT, PTUM, UPRec, and LURM.

Recommendation Systemspretrainingself-supervised learning
0 likes · 11 min read
Survey of User Representation Learning and Transfer Learning in Recommendation Systems
Meituan Technology Team
Meituan Technology Team
Dec 2, 2021 · Artificial Intelligence

Pretraining Techniques for Search Advertising Relevance at Meituan

Meituan improves search‑ad relevance by applying pre‑trained BERT models enhanced with data‑augmented samples, multi‑task learning, keyword extraction and two‑stage knowledge distillation, producing a lightweight distilled model that, when fused with traditional relevance signals, boosts CTR, lowers Badcase@5 and raises NDCG while preserving revenue.

BERTSearchadvertising relevance
0 likes · 30 min read
Pretraining Techniques for Search Advertising Relevance at Meituan
JD Retail Technology
JD Retail Technology
Nov 16, 2021 · Artificial Intelligence

Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations

The APCG system, awarded the AAAI 2022 Innovation Application Prize, automatically generates e‑commerce product copy using a Transformer‑Pointer network and a pretrained sequence‑to‑sequence model, incorporates quality control, employs novel pretraining tasks, and has produced millions of descriptions that boost CTR, CVR, and GMV.

AITransformercopywriting
0 likes · 6 min read
Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations
DataFunSummit
DataFunSummit
Nov 14, 2021 · Artificial Intelligence

Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing

This article introduces the importance of pre‑training in natural language processing, reviews classic pre‑training models such as Skip‑thoughts, BERT, GPT‑2 and T5, presents the modular UER‑py framework and its Chinese resources, compares it with Huggingface Transformers, and outlines practical deployment steps in industry.

NLPUER-pylanguage models
0 likes · 22 min read
Overview of Pre‑training Models and the UER‑py Framework for Natural Language Processing
DataFunTalk
DataFunTalk
Oct 22, 2021 · Artificial Intelligence

Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability

This article presents a comprehensive overview of how AI technologies are applied to credit reporting and loan risk modeling, detailing data characteristics, end‑to‑end model architectures, pre‑training strategies, risk‑ranking methods, and interpretability techniques for financial risk assessment.

AIInterpretabilityModel Optimization
0 likes · 17 min read
Applying AI Techniques to Credit Reporting and Risk Modeling: Model Structure, Pre‑training, Ranking and Interpretability
Alimama Tech
Alimama Tech
May 27, 2021 · Artificial Intelligence

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)

PCF‑GNN builds a heterogeneous graph of feature nodes and learns edge statistics via pre‑training, enabling it to infer unseen cross‑features, reduce storage by over 50%, and consistently improve CTR prediction AUC compared to implicit and explicit baselines, with proven online gains.

Graph Neural NetworkRecommendation Systemscross feature
0 likes · 12 min read
Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction (PCF‑GNN)
DataFunTalk
DataFunTalk
May 8, 2021 · Artificial Intelligence

Attribute‑Level Sentiment Analysis for E‑commerce: Tasks, Challenges, and System Design

This article presents a comprehensive overview of sentiment analysis in user‑generated content, detailing document‑, sentence‑, and aspect‑level tasks, defining the Aspect Sentiment Triplet Extraction problem for e‑commerce reviews, describing a three‑stage pipeline with pre‑training, multi‑domain modeling and attribute normalization, and reporting significant business improvements such as 400% CTR lift, while also discussing data imbalance, annotation scarcity, and future research directions.

Sentiment Analysisaspect based sentimente‑commerce
0 likes · 15 min read
Attribute‑Level Sentiment Analysis for E‑commerce: Tasks, Challenges, and System Design
Cyber Elephant Tech Team
Cyber Elephant Tech Team
Apr 28, 2021 · Artificial Intelligence

Understanding BERT: From Encoder-Decoder to Transformer and Attention

This article explains the BERT model by first reviewing the Encoder-Decoder framework, then detailing the attention mechanism—including self-attention and multi-head attention—before describing the full Transformer architecture and finally outlining BERT’s encoder-only design, training stages, and fine-tuning applications.

BERTEncoder-DecoderNLP
0 likes · 15 min read
Understanding BERT: From Encoder-Decoder to Transformer and Attention
DataFunTalk
DataFunTalk
Apr 7, 2021 · Artificial Intelligence

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice

This article presents Alibaba's comprehensive research on multilingual neural machine translation, covering motivations, model architectures, intermediate language modules, data‑augmentation strategies such as repair translation, integration of pre‑trained models with adapters, and engineering optimizations that enable a production‑ready system supporting over 200 languages.

AdapterAlibabaNeural Machine Translation
0 likes · 21 min read
Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice
DataFunTalk
DataFunTalk
Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Loss FunctionsModel OptimizationNLP
0 likes · 15 min read
Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge
DataFunTalk
DataFunTalk
Feb 20, 2021 · Artificial Intelligence

Industrial-Scale Machine Translation at Bytedance: Applications, Demos, and Research Advances

This article presents Bytedance's industrial machine‑translation platform, describing its global deployment, diverse product demos, underlying sequence‑to‑sequence models, BERT‑enhanced training strategies, prune‑tune sparsity techniques, multilingual pre‑training, document translation, and a high‑performance inference engine.

BERTmachine translationmultilingual NLP
0 likes · 19 min read
Industrial-Scale Machine Translation at Bytedance: Applications, Demos, and Research Advances
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Artificial Intelligence

Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation

This article analyzes the RealFormer modification to the Transformer architecture, details its implementation in BERT, and presents extensive experiments showing that while RealFormer can boost performance on low‑label‑count classification tasks, its benefits diminish or disappear as the number of classes grows.

BERTRealFormerResidual
0 likes · 12 min read
Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation
DataFunTalk
DataFunTalk
Dec 25, 2020 · Artificial Intelligence

Exploring Pretraining Model Optimization and Deployment Challenges in NLP

This article reviews the evolution of pretraining models in NLP, discusses the practical challenges of deploying large models such as inference latency, knowledge integration, and task adaptation, and presents Xiaomi’s optimization techniques including knowledge distillation, low‑precision inference, operator fusion, and multi‑granularity segmentation for dialogue systems.

BERTDialogue SystemsInference Optimization
0 likes · 15 min read
Exploring Pretraining Model Optimization and Deployment Challenges in NLP