Tagged articles

pretraining

122 articles · Page 2 of 2

Apr 28, 2021 · Artificial Intelligence

Understanding BERT: From Encoder-Decoder to Transformer and Attention

This article explains the BERT model by first reviewing the Encoder-Decoder framework, then detailing the attention mechanism—including self-attention and multi-head attention—before describing the full Transformer architecture and finally outlining BERT’s encoder-only design, training stages, and fine-tuning applications.

BERTEncoder-DecoderNLP

0 likes · 15 min read

Understanding BERT: From Encoder-Decoder to Transformer and Attention

DataFunTalk

Apr 7, 2021 · Artificial Intelligence

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice

This article presents Alibaba's comprehensive research on multilingual neural machine translation, covering motivations, model architectures, intermediate language modules, data‑augmentation strategies such as repair translation, integration of pre‑trained models with adapters, and engineering optimizations that enable a production‑ready system supporting over 200 languages.

AdapterAlibabaNeural Machine Translation

0 likes · 21 min read

Alibaba's Advances in Multilingual Neural Machine Translation: Research and Practice

DataFunTalk

Apr 5, 2021 · Artificial Intelligence

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

The article reviews the Chinese NLP pre‑training model generalization competition, detailing data preprocessing, augmentation, external data usage, model scaling and architecture tweaks, loss functions, learning‑rate and adversarial training strategies, regularization techniques, post‑processing optimizations, and ineffective methods, highlighting their impact on performance metrics.

Loss FunctionsModel OptimizationNLP

0 likes · 15 min read

Summary of Methods and Findings from the NLP Chinese Pre‑training Model Generalization Challenge

DataFunTalk

Feb 20, 2021 · Artificial Intelligence

Industrial-Scale Machine Translation at Bytedance: Applications, Demos, and Research Advances

This article presents Bytedance's industrial machine‑translation platform, describing its global deployment, diverse product demos, underlying sequence‑to‑sequence models, BERT‑enhanced training strategies, prune‑tune sparsity techniques, multilingual pre‑training, document translation, and a high‑performance inference engine.

BERTMachine Translationmultilingual NLP

0 likes · 19 min read

Industrial-Scale Machine Translation at Bytedance: Applications, Demos, and Research Advances

Sohu Tech Products

Feb 17, 2021 · Artificial Intelligence

Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation

This article analyzes the RealFormer modification to the Transformer architecture, details its implementation in BERT, and presents extensive experiments showing that while RealFormer can boost performance on low‑label‑count classification tasks, its benefits diminish or disappear as the number of classes grows.

BERTRealFormerResidual

0 likes · 12 min read

Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation

DataFunTalk

Dec 25, 2020 · Artificial Intelligence

Exploring Pretraining Model Optimization and Deployment Challenges in NLP

This article reviews the evolution of pretraining models in NLP, discusses the practical challenges of deploying large models such as inference latency, knowledge integration, and task adaptation, and presents Xiaomi’s optimization techniques including knowledge distillation, low‑precision inference, operator fusion, and multi‑granularity segmentation for dialogue systems.

BERTDialogue SystemsInference Optimization

0 likes · 15 min read

Exploring Pretraining Model Optimization and Deployment Challenges in NLP

Sohu Tech Products

Nov 4, 2020 · Artificial Intelligence

Understanding BERT: Architecture, Pre‑training, Fine‑tuning and Applications in Modern NLP

This article provides a comprehensive overview of BERT and related NLP advances, covering its historical context, model architecture, input‑output mechanisms, comparisons with CNNs, word‑embedding evolution, pre‑training strategies like MLM and next‑sentence prediction, and practical guidance for fine‑tuning and feature extraction.

BERTNLPTransformer

0 likes · 17 min read

Understanding BERT: Architecture, Pre‑training, Fine‑tuning and Applications in Modern NLP

JD Cloud Developers

Nov 4, 2020 · Artificial Intelligence

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

The article recaps the inaugural Multimodal Natural Language Processing workshop at NLPCC 2020, highlighting breakthroughs in multimodal summarization, pre‑training models, AI‑driven art, visual‑language interaction, and multimodal dialogue systems, and showcases research from leading institutions and industry partners.

.aiMultimodalNLP

0 likes · 9 min read

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

DataFunTalk

Sep 23, 2020 · Artificial Intelligence

From Word Embedding to BERT: A Comprehensive Overview of Pre‑training Model Development in NLP

This article surveys the evolution of pre‑training models for natural language processing, detailing model architectures such as Encoder‑AE, Decoder‑AR, Encoder‑Decoder, Prefix LM, and PLM, analyzing why models like RoBERTa, T5, and GPT‑3 excel, and offering practical guidance for building strong pre‑training systems.

BERTLanguage ModelsNLP

0 likes · 47 min read

From Word Embedding to BERT: A Comprehensive Overview of Pre‑training Model Development in NLP

58 Tech

Aug 14, 2020 · Artificial Intelligence

Using SPTM in qa_match for the 58 City AI Competition: Data Preparation, Model Training, and Prediction

This article provides a step‑by‑step guide on preparing data, pre‑training the SPTM lightweight model, fine‑tuning a text‑classification model with qa_match, and generating competition‑ready predictions for the 58 City AI Algorithm Contest, including all required shell commands and parameter explanations.

.aiSPTMText Classification

0 likes · 9 min read

Using SPTM in qa_match for the 58 City AI Competition: Data Preparation, Model Training, and Prediction

NetEase Media Technology Team

Jul 24, 2020 · Artificial Intelligence

Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training

This survey reviews video action recognition, comparing 3D convolutional networks that jointly model spatial‑temporal cues but are computationally heavy with 2D‑based approaches like TSM and TIN that embed temporal shifts efficiently, and emphasizes how large‑scale pre‑training markedly improves performance despite limited labeled data.

2D convolutional networks3D convolutional networkscomputer vision

0 likes · 13 min read

Survey of Video Action Recognition Algorithms: 3D and 2D Convolutional Networks and Pre‑training

Alibaba Cloud Developer

Jun 2, 2020 · Artificial Intelligence

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

This article introduces FashionBERT, a multimodal BERT‑based model that replaces ROI‑based image tokens with uniform image patches to overcome e‑commerce specific challenges, details its architecture, adaptive loss balancing, deployment in Alibaba search, and reports significant performance gains on public and internal datasets.

BERTDeep LearningImage-Text Matching

0 likes · 13 min read

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

DataFunTalk

Dec 27, 2019 · Artificial Intelligence

NLP Challenges and Tagging Solutions in Sina Weibo Feed

This article reviews the specific NLP difficulties encountered in Sina Weibo's feed—such as short text, informal language, and ambiguous user behavior—and details the multi‑stage tagging system, material library, multimodal modeling, multi‑task learning, and large‑scale pre‑training techniques used to address them.

BERTNLPWeibo

0 likes · 15 min read

NLP Challenges and Tagging Solutions in Sina Weibo Feed

Yanxuan Tech Team

Dec 9, 2019 · Artificial Intelligence

How NetEase Yanxuan Leverages BERT, GPT, and ELMo for Real-World NLP Tasks

This article reviews the evolution of language models from bag‑of‑words to BERT, compares ELMo, GPT, and BERT architectures, and details how NetEase Yanxuan applies pre‑trained models to classification, text matching, sequence labeling, and generative tasks in production.

BERTELMoGPT

0 likes · 19 min read

How NetEase Yanxuan Leverages BERT, GPT, and ELMo for Real-World NLP Tasks

Meituan Technology Team

Nov 14, 2019 · Artificial Intelligence

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

MT‑BERT at Meituan‑Dianping combines mixed‑precision, domain‑adapted continual pre‑training, knowledge‑graph‑aware masking, and extensive compression techniques to produce fast, accurate BERT models that power fine‑grained sentiment analysis, intent classification, recommendation reasoning, and other NLP tasks across the platform.

BERTMT-BERTNLP

0 likes · 33 min read

MT-BERT: Pre‑training and Fine‑tuning Practices at Meituan‑Dianping

JD Tech Talk

Nov 5, 2019 · Artificial Intelligence

GeoBERT: A Multi‑Task Pre‑trained Language Model for Chinese Address Text

This article introduces GeoBERT, a novel pre‑training method for Chinese address strings that leverages seven jointly constrained tasks to capture spatial semantics, administrative hierarchy, and similarity relationships, enabling downstream address classification, segmentation, POI extraction, similarity comparison, and authenticity verification with reduced annotation dependence.

Chinese LanguageGeoBERTGeocoding

0 likes · 15 min read

GeoBERT: A Multi‑Task Pre‑trained Language Model for Chinese Address Text

Alibaba Cloud Developer

Aug 15, 2019 · Artificial Intelligence

How Auto Risk Transforms Behavior Sequence Data with Unsupervised Pre‑Training

This article introduces Auto Risk, a deep‑learning risk model for behavior‑sequence data that leverages unsupervised pre‑training with proxy tasks, details its convolution‑attention encoder, demonstrates significant gains across multiple business scenarios, and highlights its strong small‑sample and analogy capabilities.

Deep Learningbehavior sequencepretraining

0 likes · 20 min read

How Auto Risk Transforms Behavior Sequence Data with Unsupervised Pre‑Training

Alibaba Cloud Developer

Jul 30, 2019 · Artificial Intelligence

Auto Risk: Pretraining Deep Models on Unlabeled Behavior Sequences

This article introduces Auto Risk, a behavior‑sequence deep‑learning framework that uses unsupervised pre‑training with proxy tasks to learn universal feature representations from massive unlabeled data, achieving significant gains in risk‑control scenarios, improving AUC, supporting multi‑scene generalization and small‑sample learning.

Deep Learningbehavior sequencepretraining

0 likes · 20 min read

Auto Risk: Pretraining Deep Models on Unlabeled Behavior Sequences

DataFunTalk

Jun 23, 2019 · Artificial Intelligence

Understanding XLNet: Differences from BERT, Innovations, and Experimental Analysis

This article examines XLNet, contrasting it with BERT by detailing its novel permutation language modeling, dual‑stream attention, and larger pre‑training data, and analyzes experimental results that show XLNet’s superior performance on reading‑comprehension, GLUE, and other NLP tasks, especially for long documents.

BERTLanguage ModelsNLP

0 likes · 27 min read

Understanding XLNet: Differences from BERT, Innovations, and Experimental Analysis

Hulu Beijing

Apr 4, 2019 · Artificial Intelligence

How BERT, GPT, and ELMo Revolutionize Language Feature Representation

Natural language processing, a cornerstone of AI, relies on language models to capture linguistic features; this article reviews classic pre‑training models—ELMo, GPT, and BERT—explaining their architectures, training objectives, and how they boost downstream NLP tasks despite data‑scarcity challenges.

BERTDeep LearningELMo

0 likes · 10 min read

How BERT, GPT, and ELMo Revolutionize Language Feature Representation

Meituan Technology Team

Jan 25, 2019 · Artificial Intelligence

Fine-grained User Review Sentiment Classification: AI Challenger 2018 Champion's Approach

Cheng Huige’s winning AI Challenger 2018 solution treated fine‑grained Chinese review sentiment as a 20‑aspect multi‑class task, combining a high‑capacity LSTM encoder with self‑attention, word‑and‑character embeddings, simplified ELMo pre‑training, diverse tokenizations and a weighted seven‑model ensemble (including BERT), which together delivered the competition’s top F1 performance.

BERTDeep LearningELMo

0 likes · 14 min read

Fine-grained User Review Sentiment Classification: AI Challenger 2018 Champion's Approach

Tencent TDS Service

Jan 24, 2019 · Artificial Intelligence

Unlocking BERT: How Its Transformer Engine Powers State-of-the-Art Text Classification

This article explains BERT’s architecture—from its bidirectional Transformer encoder and attention mechanisms to its pre‑training tasks—and presents extensive experiments showing its superior performance on various Chinese and English text‑classification benchmarks across multiple datasets.

BERTNLPText Classification

0 likes · 22 min read

Unlocking BERT: How Its Transformer Engine Powers State-of-the-Art Text Classification