Tagged articles
383 articles
Page 4 of 4
Alimama Tech
Alimama Tech
Feb 1, 2023 · Artificial Intelligence

Video Object of Interest Segmentation (VOIS): Task, Dataset, and Dual-Path Transformer Approach

The paper presents Video Object of Interest Segmentation (VOIS), a new e‑commerce task that locates and segments video instances matching a given product image, introduces the LiveVideos dataset of 2,418 Taobao live‑stream clips, and proposes a dual‑path Swin‑Transformer with cross‑fusion modules that outperforms existing VOS/VIS baselines.

DatasetTransformerinstance segmentation
0 likes · 11 min read
Video Object of Interest Segmentation (VOIS): Task, Dataset, and Dual-Path Transformer Approach
DataFunSummit
DataFunSummit
Jan 15, 2023 · Artificial Intelligence

Intelligent Writing: AIGC Technologies, Models, Evaluation Metrics, and Real‑World Applications

This article surveys the evolution of AI‑generated content for intelligent writing, covering its definition, key technologies from RNN Seq2Seq to Transformer‑based models such as UniLM, T5, BART and GPT series, evaluation datasets and metrics, product deployments by Datagrand, and the remaining challenges and future directions.

AI writingAIGCGPT
0 likes · 25 min read
Intelligent Writing: AIGC Technologies, Models, Evaluation Metrics, and Real‑World Applications
DataFunSummit
DataFunSummit
Jan 14, 2023 · Artificial Intelligence

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

This article surveys the most influential Transformer‑based research papers—from the original Attention Is All You Need work to recent models such as Autoformer and FEDformer—covering breakthroughs in natural language processing, computer vision, speech recognition, and long‑term series forecasting, and provides download links for each.

AITime-Series ForecastingTransformer
0 likes · 17 min read
Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains
21CTO
21CTO
Jan 13, 2023 · Artificial Intelligence

How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding

Google’s new Muse model, a Transformer‑based text‑to‑image system running on TPUv4, claims to generate 256×256 images in 0.5 seconds—far faster than Imagen—while delivering unprecedented photorealism and deep language understanding through parallel decoding and large‑scale LLM‑conditioned training.

AI researchGoogle MuseLLM conditioning
0 likes · 4 min read
How Google’s Muse Is Redefining Text‑to‑Image Generation with Parallel Decoding
AntTech
AntTech
Dec 19, 2022 · Artificial Intelligence

TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision

TransVCL introduces an end‑to‑end attention‑enhanced video copy localization network that leverages a custom Transformer, correlation‑Softmax similarity matrix, and temporal alignment module, combined with a semi‑supervised learning framework, achieving state‑of‑the‑art performance on VCSL and VCDB benchmarks.

AISemi-supervised LearningTransformer
0 likes · 13 min read
TransVCL: Attention‑Enhanced Video Copy Localization Network with Flexible Supervision
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Efficient Spatiotemporal Self‑Attention Transformer (Patch Shift Transformer) for Video Action Recognition

This article introduces a lightweight spatiotemporal self‑attention transformer, called Patch Shift Transformer, which achieves competitive video action recognition performance on datasets such as Kinetics‑400, Sth‑v1/v2, and Diving48 without increasing computational cost or parameters, and details its design, experiments, and speed advantages.

ECCV 2022Transformerpatch shift
0 likes · 5 min read
Efficient Spatiotemporal Self‑Attention Transformer (Patch Shift Transformer) for Video Action Recognition
HelloTech
HelloTech
Oct 19, 2022 · Artificial Intelligence

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

The Intelligent Creative System defines advertising creatives across formats, evaluates image and text quality using reference‑based metrics and models like DeepBIQ, generates multimodal ads via GANs and Transformers, and selects optimal variants through bandit‑based CTR prediction and multimodal fusion, enabling scalable, data‑driven creative production.

AIBandit ModelGAN
0 likes · 10 min read
Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 10, 2022 · Artificial Intelligence

A Beginner’s Journey into Vision Transformers (ViT) for Computer Vision Engineers

This article introduces the fundamentals of Vision Transformers (ViT) for computer‑vision developers, starting with an overview of the transformer architecture, detailed explanation of self‑attention and multi‑head attention, and step‑by‑step PyTorch code examples that illustrate query, key, value computation and attention scoring.

PyTorchSelf-AttentionTransformer
0 likes · 12 min read
A Beginner’s Journey into Vision Transformers (ViT) for Computer Vision Engineers
HaoDF Tech Team
HaoDF Tech Team
Oct 8, 2022 · Artificial Intelligence

Exploring Transformer Technology and Its Applications in NLP, Computer Vision, and OCR at Haodf.com

This article introduces the Transformer architecture, explains its attention mechanism, details its adaptations for natural language processing, computer vision, and OCR tasks, and presents experimental results of various models such as BERT, ELECTRA, Swin Transformer, and CRNN-BCN on large-scale medical data from Haodf.com.

Model EvaluationNLPOCR
0 likes · 39 min read
Exploring Transformer Technology and Its Applications in NLP, Computer Vision, and OCR at Haodf.com
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Sep 29, 2022 · Artificial Intelligence

How DeViT Revolutionizes Video Inpainting with Deformed Vision Transformers

The article introduces DeViT, a novel Deformed Vision Transformer framework for video inpainting that leverages a deformable patch homography estimator, mask‑pruned attention, and spatio‑temporal weight adaptation, achieving state‑of‑the‑art results on benchmark datasets and highlighting its potential for advanced video editing tools.

DeViTMultimediaTransformer
0 likes · 10 min read
How DeViT Revolutionizes Video Inpainting with Deformed Vision Transformers
ELab Team
ELab Team
Sep 23, 2022 · Artificial Intelligence

Fine‑Tune a Chinese BERT Model for Cloze Tasks in 30 Minutes

This tutorial walks you through NLP fundamentals, the evolution of BERT, the concept of pre‑trained models, and a step‑by‑step guide to fine‑tune a Chinese BERT on a cloze‑style task, complete with code snippets and verification results.

BERTChineseCloze Task
0 likes · 13 min read
Fine‑Tune a Chinese BERT Model for Cloze Tasks in 30 Minutes
Alimama Tech
Alimama Tech
Sep 21, 2022 · Artificial Intelligence

EXTR: Click-Through Rate Prediction with Externalities in E-Commerce Sponsored Search

The paper introduces EXTR, a Transformer‑based CTR prediction model that jointly encodes diverse externalities from surrounding organic results and ads and infers missing ad placements via a Potential Allocation Generator, achieving superior AUC, COPC and LogLoss on Taobao data and deployment in Alibaba’s advertising system.

AdvertisingExternalitiesTransformer
0 likes · 11 min read
EXTR: Click-Through Rate Prediction with Externalities in E-Commerce Sponsored Search
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2022 · Artificial Intelligence

Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server

This article presents a detailed benchmark of four Transformer models of varying sizes trained on the high‑end Inspur NF5488A5 GPU server, compares its NVSwitch‑based interconnect with a PCIe‑based system, and analyzes the impact of model scale, tensor parallelism, and hardware bandwidth on training efficiency.

DeepSpeedGPU serverMegatron-LM
0 likes · 12 min read
Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 11, 2022 · Artificial Intelligence

How Structure-Aware Sparse Attention Boosts Long-Code Transformers

The SASA model, a structure‑aware sparse‑attention Transformer developed by Alibaba Cloud PAI and Prof. Gao Ming’s team, improves long‑code sequence processing by sparsifying self‑attention using top‑k frequency and AST pattern matrices, achieving higher performance and lower memory/computation costs on CodeXGLUE benchmarks.

ASTCode UnderstandingLong Sequences
0 likes · 8 min read
How Structure-Aware Sparse Attention Boosts Long-Code Transformers
DataFunTalk
DataFunTalk
Jul 9, 2022 · Artificial Intelligence

Graph Neural Networks Enter the Transformer Era – Seminar by Dr. Zheng Shuxin

The LOGS seminar on July 9, 2022 featured Dr. Zheng Shuxin from Microsoft Research presenting an overview of Transformer models, their success in NLP and CV, recent breakthroughs in applying Transformers to graph data, and future directions for graph processing.

AI SeminarMicrosoft researchTransformer
0 likes · 4 min read
Graph Neural Networks Enter the Transformer Era – Seminar by Dr. Zheng Shuxin
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationComputer VisionDataset
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)
DaTaobao Tech
DaTaobao Tech
May 24, 2022 · Artificial Intelligence

GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection

GEN‑VLKT introduces a Guided‑Embedding Network with position‑ and instance‑guided embeddings to remove costly post‑processing and leverages CLIP‑based visual‑linguistic knowledge transfer for interaction understanding, achieving state‑of‑the‑art HOI detection performance and zero‑shot capability, now deployed in Alibaba’s Taobao services.

CLIPHOI detectionTransformer
0 likes · 7 min read
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
DataFunTalk
DataFunTalk
May 7, 2022 · Artificial Intelligence

Intelligent Recommendation Selling Point Generation: Architecture, Core AI Techniques, Model Development, and Product Impact

This article explains how JD's intelligent recommendation selling point system leverages NLP, BERT, Transformer and pointer‑generator models to automatically create short, personalized product highlights, describing the technical background, system architecture, model training pipeline, online/offline monitoring, and the resulting business benefits.

BERTNLPRecommendation Systems
0 likes · 13 min read
Intelligent Recommendation Selling Point Generation: Architecture, Core AI Techniques, Model Development, and Product Impact
AntTech
AntTech
Apr 27, 2022 · Artificial Intelligence

Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting

The paper introduces Pyraformer, a low‑complexity pyramidal‑attention Transformer that captures multi‑scale temporal dependencies with linear time‑space complexity, achieving superior single‑step and long‑range forecasting performance on real‑world datasets while supporting green‑computing capacity management.

PyraformerTransformerlong-range dependencies
0 likes · 14 min read
Pyraformer: Low-Complexity Pyramidal Attention for Long-Range Time Series Modeling and Forecasting
DataFunSummit
DataFunSummit
Apr 25, 2022 · Artificial Intelligence

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.

Performance OptimizationPipeline ParallelismToken-level
0 likes · 13 min read
Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)
Kuaishou Large Model
Kuaishou Large Model
Apr 6, 2022 · Artificial Intelligence

How Transformers Revolutionize Image Style Transfer: Introducing StyTr²

This article reviews the limitations of traditional CNN‑based image stylization, explains how Transformer architectures overcome these issues with global context and self‑attention, and presents the novel StyTr² method with content‑aware positional encoding that achieves superior, detail‑preserving style transfer results.

Computer VisionDeep LearningTransformer
0 likes · 8 min read
How Transformers Revolutionize Image Style Transfer: Introducing StyTr²
IEG Growth Platform Technology Team
IEG Growth Platform Technology Team
Feb 14, 2022 · Artificial Intelligence

Multimodal Evolution and Application in Tencent Game Advertising System

This article describes the end‑to‑end multimodal modeling pipeline—covering text, image, and video understanding, model evolution from shallow to deep networks, key‑frame extraction, fine‑tuning, and multimodal fusion—used in Tencent's game ad exchange platform, along with practical deployment challenges and solutions.

AdvertisingCNNMultimodal Learning
0 likes · 22 min read
Multimodal Evolution and Application in Tencent Game Advertising System
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2022 · Artificial Intelligence

BERT Interview Q&A: Decoding CLS, Masks, Complexity, and More

An in‑depth Q&A breaks down core BERT concepts—from the purpose of the [CLS] token and masking strategies to self‑attention complexity, sparse attention tricks, subword handling of OOV words, warm‑up learning rates, GPT’s unidirectional nature, and ALBERT’s parameter sharing—providing concise explanations for each.

BERTMaskingSelf-Attention
0 likes · 7 min read
BERT Interview Q&A: Decoding CLS, Masks, Complexity, and More
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 7, 2022 · Interview Experience

Essential Transformer Interview Cheat Sheet: 11 Must‑Know Q&A

This concise guide presents eleven frequently asked Transformer interview questions with clear, English explanations covering self‑attention formulas, scaling, alternative designs, LayerNorm vs. BatchNorm, positional embeddings, multi‑head mechanisms, and BPE tokenization, helping candidates deliver solid, theory‑backed answers.

BERTDeep LearningLayerNorm
0 likes · 6 min read
Essential Transformer Interview Cheat Sheet: 11 Must‑Know Q&A
Kuaishou Tech
Kuaishou Tech
Jan 5, 2022 · Artificial Intelligence

How a New Bilingual Video Text Dataset and Transformer Spotter Advance Video OCR

This article reviews the NeurIPS 2021 paper introducing BOVText, a large‑scale bilingual video‑text dataset with over 2,000 videos and 1.75 million frames, and describes its transformer‑based end‑to‑end video text spotter that integrates EAST encoding into DETR, covering dataset collection, annotation, architecture, and experimental results.

BOVTextDETRTransformer
0 likes · 12 min read
How a New Bilingual Video Text Dataset and Transformer Spotter Advance Video OCR
Code DAO
Code DAO
Dec 30, 2021 · Artificial Intelligence

Exemplar Transformers Enable 8× Faster CPU‑Compatible Visual Tracking

Researchers at ETH Zurich introduce Exemplar Transformers, a novel Transformer layer that accelerates visual object tracking by eight times, runs in real‑time on CPUs, and improves robustness when integrated into a Siamese‑based tracker, achieving state‑of‑the‑art performance on six benchmark datasets.

BenchmarkCPUSiamese tracker
0 likes · 5 min read
Exemplar Transformers Enable 8× Faster CPU‑Compatible Visual Tracking
Code DAO
Code DAO
Dec 17, 2021 · Artificial Intelligence

Applying UNETR Transformer for 3D Medical Image Segmentation

This article walks through using the UNETR transformer architecture to segment 3D brain MRI scans from the BRATS dataset, detailing environment setup, data preprocessing with MONAI, model construction, training with DiceCE loss, validation metrics, and visualizing the best‑performing model outputs.

3D segmentationBRATSMONAI
0 likes · 16 min read
Applying UNETR Transformer for 3D Medical Image Segmentation
Baobao Algorithm Notes
Baobao Algorithm Notes
Dec 15, 2021 · Artificial Intelligence

Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding

This article revisits the long‑standing question of why BERT’s token, segment, and position embeddings are summed, critiques earlier explanations, and presents findings from the ICLR‑2021 paper “Rethinking Positional Encoding in Language Pre‑training” that show removing the token‑position cross term speeds convergence and improves downstream GLUE scores.

BERTEmbeddingLanguage Pretraining
0 likes · 6 min read
Why Can BERT’s Token, Segment, and Position Embeddings Be Added? A Deep Dive into Positional Encoding
Code DAO
Code DAO
Dec 14, 2021 · Artificial Intelligence

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

This article walks through constructing a learnable chess AI by integrating AlphaZero‑style Monte Carlo Tree Search with a decoder‑only Transformer, detailing the game tree logic, model architecture, input and output encodings, self‑play training loop, and code implementation in PyTorch.

AlphaZeroMonteCarloTreeSearchPyTorch
0 likes · 23 min read
Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)
DataFunSummit
DataFunSummit
Nov 21, 2021 · Artificial Intelligence

Sequential Recommendation Algorithms: Overview and Techniques

This article surveys sequential recommendation methods, covering standard models such as pooling, RNN, CNN, attention, and Transformer, as well as long‑short term, multi‑interest, multi‑behavior approaches, and recent advances like contrastive learning, highlighting their impact on recommendation performance.

RNNTransformerattention
0 likes · 8 min read
Sequential Recommendation Algorithms: Overview and Techniques
JD Retail Technology
JD Retail Technology
Nov 16, 2021 · Artificial Intelligence

Intelligent Online Selling Point Extraction for E‑Commerce Recommendation (IOSPE) Wins AAAI 2022 Innovation Award

The IOSPE system, which uses BERT‑based scoring, transformer‑pointer generation, and personalized distribution to automatically extract and generate selling points for millions of e‑commerce products, earned the AAAI 2022 Artificial Intelligence Innovation Application Award and has boosted click‑through rates and user dwell time across JD.com platforms.

AIBERTInnovation Award
0 likes · 6 min read
Intelligent Online Selling Point Extraction for E‑Commerce Recommendation (IOSPE) Wins AAAI 2022 Innovation Award
JD Retail Technology
JD Retail Technology
Nov 16, 2021 · Artificial Intelligence

Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations

The APCG system, awarded the AAAI 2022 Innovation Application Prize, automatically generates e‑commerce product copy using a Transformer‑Pointer network and a pretrained sequence‑to‑sequence model, incorporates quality control, employs novel pretraining tasks, and has produced millions of descriptions that boost CTR, CVR, and GMV.

AITransformercopywriting
0 likes · 6 min read
Automatic Product Copywriting for E-Commerce: The APCG System and Its AI Innovations
58 Tech
58 Tech
Oct 12, 2021 · Artificial Intelligence

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

This article presents a practical study of extracting phone numbers from two‑speaker voice dialogues using Seq2Seq models—including LSTM, GRU with attention and feature fusion, and Transformer—detailing data characteristics, model architectures, training strategies, experimental results, and comparative analysis showing the GRU‑Attention approach achieving the best performance.

GRULSTMNLP
0 likes · 13 min read
Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues
Python Programming Learning Circle
Python Programming Learning Circle
Aug 30, 2021 · Artificial Intelligence

DeepDebug: Transformer‑Based Automatic Debugging Using Large Pretrained Models

The paper presents DeepDebug, a transformer‑based system that leverages large pretrained models and extensive synthetic and real‑world data to automatically localize and fix bugs in Python code, achieving significant improvements in patch generation success rates and reduction of false positives on benchmarks such as QuixBugs.

Software EngineeringTransformerautomatic debugging
0 likes · 12 min read
DeepDebug: Transformer‑Based Automatic Debugging Using Large Pretrained Models
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 18, 2021 · Artificial Intelligence

Mastering Text Recognition: Encoder & Decoder Strategies Explained

This article reviews modern text‑recognition systems, detailing how encoders such as CNN, CNN‑BiLSTM, and Transformer‑based models extract visual features, and how decoders like Position Attention, Transformer decoders, and RNN Seq2Seq align variable‑length text, while also discussing CTC loss and practical design choices.

CNNCTCDecoder
0 likes · 9 min read
Mastering Text Recognition: Encoder & Decoder Strategies Explained
Meituan Technology Team
Meituan Technology Team
Jun 3, 2021 · Artificial Intelligence

VisTR: End-to-End Video Instance Segmentation with Transformers

VisTR redefines video instance segmentation as an end‑to‑end sequence‑to‑sequence task, using a CNN backbone, Transformer encoder‑decoder with instance queries, and Hungarian matching to jointly predict masks, classes, and tracks across frames, achieving state‑of‑the‑art accuracy (40.1 AP) and 57.7 FPS on YouTube‑VIS.

TransformerVideo Instance SegmentationVisTR
0 likes · 21 min read
VisTR: End-to-End Video Instance Segmentation with Transformers
DataFunTalk
DataFunTalk
May 15, 2021 · Artificial Intelligence

Multi‑Interest Recall Techniques in iQIYI Short‑Video Recommendation

The article reviews the evolution of iQIYI's short‑video recommendation recall pipeline, detailing multi‑interest recall methods such as clustering‑based recall, MOE‑based recall, single‑activation multi‑interest networks, regularization strategies, dynamic capacity handling, and multimodal extensions, and discusses their impact on recommendation performance.

TransformeriQIYImachine learning
0 likes · 15 min read
Multi‑Interest Recall Techniques in iQIYI Short‑Video Recommendation
Cyber Elephant Tech Team
Cyber Elephant Tech Team
Apr 28, 2021 · Artificial Intelligence

Understanding BERT: From Encoder-Decoder to Transformer and Attention

This article explains the BERT model by first reviewing the Encoder-Decoder framework, then detailing the attention mechanism—including self-attention and multi-head attention—before describing the full Transformer architecture and finally outlining BERT’s encoder-only design, training stages, and fine-tuning applications.

BERTEncoder-DecoderNLP
0 likes · 15 min read
Understanding BERT: From Encoder-Decoder to Transformer and Attention
DataFunTalk
DataFunTalk
Apr 17, 2021 · Artificial Intelligence

Personalized Re-ranking for Recommendation (ResSys'19)

This article introduces a personalized re‑ranking model for recommendation systems, explaining the limitations of traditional point‑wise ranking, describing the PRM architecture with input, encoding, and output layers using multi‑head attention and pre‑trained personalization features, and presenting experimental results and future extensions.

CTRTransformerattention
0 likes · 7 min read
Personalized Re-ranking for Recommendation (ResSys'19)
DataFunTalk
DataFunTalk
Apr 16, 2021 · Artificial Intelligence

Live Streaming Recommendation Ranking Model Evolution and Multi‑Objective Learning at Alibaba 1688

This article presents a comprehensive overview of Alibaba's 1688 live‑streaming recommendation system, detailing core challenges such as heterogeneous behavior modeling, multi‑objective optimization, and bias mitigation, and describing four successive model iterations—from feature‑engineered GBDT to attention‑based heterogeneous networks and transformer architectures—along with experimental results and practical insights.

Recommendation SystemsTransformerbias mitigation
0 likes · 22 min read
Live Streaming Recommendation Ranking Model Evolution and Multi‑Objective Learning at Alibaba 1688
DataFunTalk
DataFunTalk
Apr 10, 2021 · Artificial Intelligence

2020 Computer Vision Breakthroughs: Self‑Supervised Learning, Transformer Attention Modeling, and Neural Radiance Fields

The talk reviews three major 2020 advances in computer vision—self‑supervised learning surpassing supervised pre‑training, the successful adoption of Transformer‑based attention models for detection and classification, and the emergence of Neural Radiance Fields for view synthesis—while highlighting related research from Microsoft Research Asia and the broader community.

2020AI breakthroughsComputer Vision
0 likes · 19 min read
2020 Computer Vision Breakthroughs: Self‑Supervised Learning, Transformer Attention Modeling, and Neural Radiance Fields
DataFunTalk
DataFunTalk
Apr 3, 2021 · Artificial Intelligence

A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising

User behavior sequence modeling, crucial for search and recommendation advertising ranking, has evolved from simple pooling to attention, RNN, capsule, and Transformer architectures, with industrial applications across e‑commerce, social, video, and music platforms, and future directions include time‑aware, multi‑dimensional, and self‑supervised approaches.

Deep LearningRecommendation SystemsSequence Modeling
0 likes · 24 min read
A Survey of User Behavior Sequence Modeling for Search and Recommendation Advertising
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Artificial Intelligence

Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation

This article analyzes the RealFormer modification to the Transformer architecture, details its implementation in BERT, and presents extensive experiments showing that while RealFormer can boost performance on low‑label‑count classification tasks, its benefits diminish or disappear as the number of classes grows.

BERTRealFormerResidual
0 likes · 12 min read
Improving BERT Pre‑training with RealFormer: Principles, Implementation, and Empirical Evaluation
Liangxu Linux
Liangxu Linux
Feb 3, 2021 · Artificial Intelligence

Build a DIY AI Bot for Honor of Kings with Transformers, scrcpy & minitouch

Learn how to create a low‑cost AI bot for the mobile game Honor of Kings by capturing the phone screen with scrcpy, generating action commands from game images using a Transformer model, and executing those commands via minitouch, complete with setup steps, required tools, and code links.

Game AutomationMobile DevelopmentTransformer
0 likes · 6 min read
Build a DIY AI Bot for Honor of Kings with Transformers, scrcpy & minitouch
58 Tech
58 Tech
Dec 30, 2020 · Artificial Intelligence

qa_match V1.3: Lightweight Deep Learning QA Matching Tool with Semi‑Automatic Knowledge‑Base Mining and Transformer‑Enhanced Pre‑training

The qa_match open‑source tool from 58 Tongcheng, now at version 1.3, introduces semi‑automatic knowledge‑base mining for cold‑start and online scenarios and upgrades its Simple Pre‑trained Model (SPTM) with Transformer‑based feature representation to improve question‑answer matching performance.

DEC clusteringTransformerknowledge base mining
0 likes · 10 min read
qa_match V1.3: Lightweight Deep Learning QA Matching Tool with Semi‑Automatic Knowledge‑Base Mining and Transformer‑Enhanced Pre‑training
DataFunSummit
DataFunSummit
Dec 14, 2020 · Artificial Intelligence

LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models

This article introduces LightSeq, an open‑source, GPU‑accelerated inference engine that dramatically speeds up Transformer‑based models such as BERT and GPT by up to 14× over TensorFlow, supports multiple decoding strategies, integrates seamlessly with major deep‑learning frameworks, and provides detailed performance benchmarks and technical optimizations.

Deep LearningGPUInference
0 likes · 15 min read
LightSeq: High‑Performance Open‑Source Inference Engine for Transformers, GPT and Other NLP Models
Sohu Tech Products
Sohu Tech Products
Nov 25, 2020 · Artificial Intelligence

Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model

This article provides a comprehensive, illustrated walkthrough of OpenAI's GPT‑2 language model, covering its decoder‑only Transformer architecture, self‑attention mechanisms, token processing, training data, differences from BERT, and applications beyond language modeling, enriched with visual diagrams and code snippets for deeper understanding.

AIGPT-2Language Model
0 likes · 24 min read
Illustrated Guide to GPT-2: Detailed Explanation of the Decoder‑Only Transformer Model
Sohu Tech Products
Sohu Tech Products
Nov 11, 2020 · Artificial Intelligence

Illustrated Transformer: Comprehensive Explanation and Code Implementation

This article provides a step‑by‑step illustrated guide to the Transformer architecture, covering its macro structure, detailed self‑attention mechanisms, multi‑head attention, positional encoding, residual connections, decoder operation, training process, loss functions, and includes complete PyTorch and custom Python code examples.

NLPPyTorchSelf-Attention
0 likes · 33 min read
Illustrated Transformer: Comprehensive Explanation and Code Implementation
Sohu Tech Products
Sohu Tech Products
Nov 4, 2020 · Artificial Intelligence

Understanding BERT: Architecture, Pre‑training, Fine‑tuning and Applications in Modern NLP

This article provides a comprehensive overview of BERT and related NLP advances, covering its historical context, model architecture, input‑output mechanisms, comparisons with CNNs, word‑embedding evolution, pre‑training strategies like MLM and next‑sentence prediction, and practical guidance for fine‑tuning and feature extraction.

BERTFine-tuningNLP
0 likes · 17 min read
Understanding BERT: Architecture, Pre‑training, Fine‑tuning and Applications in Modern NLP
Didi Tech
Didi Tech
Oct 27, 2020 · Artificial Intelligence

Didi's Machine Translation System: Architecture, Techniques, and WMT2020 Competition Experience

Didi's machine translation system combines a Transformer‑big architecture with relative position representations, enlarged feed‑forward networks, iterative back‑translation, knowledge‑distillation and domain fine‑tuning, optimized via TensorRT for speed, achieving a BLEU 36.6 and third place in the WMT2020 Chinese‑to‑English news task.

BLEUNeural NetworksTensorRT
0 likes · 15 min read
Didi's Machine Translation System: Architecture, Techniques, and WMT2020 Competition Experience
Meituan Technology Team
Meituan Technology Team
Sep 24, 2020 · Artificial Intelligence

Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach

The second‑place team tackled KDD Cup 2020’s Multimodal Recall challenge by fine‑tuning ImageBERT and LXMERT on query‑image pairs, generating negatives, applying AMSoftmax and multi‑similarity losses, ensembling weighted predictions, and using score‑based post‑processing, boosting NDCG@5 to 0.8352 and powering Meituan’s multimodal search pipeline.

ImageBERTKDD Cup 2020LXMERT
0 likes · 23 min read
Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach
DataFunTalk
DataFunTalk
Sep 23, 2020 · Artificial Intelligence

From Word Embedding to BERT: A Comprehensive Overview of Pre‑training Model Development in NLP

This article surveys the evolution of pre‑training models for natural language processing, detailing model architectures such as Encoder‑AE, Decoder‑AR, Encoder‑Decoder, Prefix LM, and PLM, analyzing why models like RoBERTa, T5, and GPT‑3 excel, and offering practical guidance for building strong pre‑training systems.

BERTNLPTransformer
0 likes · 47 min read
From Word Embedding to BERT: A Comprehensive Overview of Pre‑training Model Development in NLP
Didi Tech
Didi Tech
Aug 23, 2020 · Artificial Intelligence

DiDi AI Labs Achieves Third Place in WMT2020 News Translation Task

DiDi AI Labs’ NLP team earned third place in the WMT2020 Chinese‑to‑English news translation task with a 36.6 BLEU score, using an enhanced Transformer‑2 model that incorporates self‑attention, relative positional attention, iterative back‑translation, knowledge distillation, data cleaning, ensembling, and other techniques, now deployed across DiDi’s international services.

BLEUDiDi AI LabsNLP
0 likes · 5 min read
DiDi AI Labs Achieves Third Place in WMT2020 News Translation Task
Didi Tech
Didi Tech
May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning
0 likes · 16 min read
How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models
Meituan Technology Team
Meituan Technology Team
Apr 16, 2020 · Artificial Intelligence

Transformer Applications in Meituan Search Ranking: Practice and Experience

Meituan’s search ranking system integrates Transformer‑based models across feature engineering, behavior sequence modeling, and re‑ranking, adapting AutoInt‑style embeddings and multi‑stage attention mechanisms to boost QV_CTR and NDCG, while outlining future enhancements with BERT, graph neural networks, and reinforcement learning.

MeituanTransformerbehavior modeling
0 likes · 16 min read
Transformer Applications in Meituan Search Ranking: Practice and Experience
Qunar Tech Salon
Qunar Tech Salon
Mar 5, 2020 · Artificial Intelligence

Content Tagging Technology for Short Videos at iQIYI: Challenges and Model Evolution

This article describes iQIYI's short‑video content tagging system, outlining the challenges of extracting type and abstract tags from multimodal data, detailing the evolution from text‑only models to image‑fusion, BERT‑enhanced, and video‑frame models, and discussing their applications and future directions.

BERTMultimodal LearningTransformer
0 likes · 11 min read
Content Tagging Technology for Short Videos at iQIYI: Challenges and Model Evolution
58 Tech
58 Tech
Mar 2, 2020 · Artificial Intelligence

Low-Quality Text Detection Using Unsupervised Language Model Perplexity

This article proposes a method to identify low-quality text in business data by training a large-scale unsupervised language model to compute sentence perplexity, converting the detection problem into a threshold decision, and details model design, challenges, optimizations, and online performance results.

BERTLanguage ModelNLP
0 likes · 13 min read
Low-Quality Text Detection Using Unsupervised Language Model Perplexity
iQIYI Technical Product Team
iQIYI Technical Product Team
Feb 14, 2020 · Artificial Intelligence

Content Tagging Technology for Short Videos: Challenges and Multi‑Modal Model Evolution at iQIYI

iQIYI’s short‑video tagging system tackles multimodal fusion, open‑set and abstract tags by evolving from a text‑only model through cover‑image, BERT‑vector, and video‑frame fusion architectures, enabling automated labeling, personalized recommendation, and semantic search while planning to add OCR, audio, and knowledge‑graph enhancements.

BERTMultimodal LearningTransformer
0 likes · 13 min read
Content Tagging Technology for Short Videos: Challenges and Multi‑Modal Model Evolution at iQIYI
Qunar Tech Salon
Qunar Tech Salon
Sep 12, 2019 · Artificial Intelligence

A Comprehensive Overview of Attention Mechanisms in Deep Learning

This article systematically reviews the history, core concepts, variants, and practical implementations of attention mechanisms—from early additive and multiplicative forms to self‑attention, multi‑head attention, and recent transformer‑based models—highlighting why attention has become fundamental in modern AI research.

Deep LearningNLPSelf-Attention
0 likes · 16 min read
A Comprehensive Overview of Attention Mechanisms in Deep Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 27, 2019 · Artificial Intelligence

How Transformers Enable Personalized Outfit Generation for Fashion Recommendation

This article presents a Transformer‑based framework that simultaneously generates visually compatible outfits and personalizes recommendations by leveraging multimodal item embeddings and user behavior, achieving significant gains in compatibility prediction, fill‑in‑the‑blank accuracy, and click‑through rate on Alibaba's iFashion platform.

Deep LearningMultimodal LearningTransformer
0 likes · 15 min read
How Transformers Enable Personalized Outfit Generation for Fashion Recommendation
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 7, 2019 · Artificial Intelligence

How KOBE Transforms Personalized Recommendation Reason Generation with Transformers

This article introduces KOBE, a knowledge‑based personalized text generation system that leverages Transformer architecture, attribute fusion, and external knowledge graphs to produce fluent, domain‑aware recommendation reasons for e‑commerce products, with a case study on the Spring Festival cloud theme.

Knowledge GraphText GenerationTransformer
0 likes · 13 min read
How KOBE Transforms Personalized Recommendation Reason Generation with Transformers
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 9, 2019 · Artificial Intelligence

Demystifying Attention: A Clear Guide to Its History, Types, and Why It Works

This article systematically reviews the evolution of attention mechanisms—from early additive and multiplicative forms to self‑attention and multi‑head variants—explaining their core three‑step framework, key differences, and why they have become essential across NLP, vision, and broader AI applications.

Deep LearningNLPSelf-Attention
0 likes · 19 min read
Demystifying Attention: A Clear Guide to Its History, Types, and Why It Works
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 5, 2019 · Artificial Intelligence

Tracing the Evolution of Language Models: From N‑grams to GPT‑2

This article reviews the historical development of natural language processing language models, covering expert rule‑based systems, statistical n‑grams, smoothing techniques, neural network models such as NNLM, RNN, word2vec, GloVe, ELMo, and the transformer‑based breakthroughs of GPT, BERT and GPT‑2, and summarizes their impact on modern NLP tasks.

BERTDeep LearningGPT
0 likes · 25 min read
Tracing the Evolution of Language Models: From N‑grams to GPT‑2
Ctrip Technology
Ctrip Technology
May 21, 2019 · Artificial Intelligence

A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights

This article surveys the evolution of machine translation from early rule‑based systems to modern neural architectures, explains how translation engines are trained, highlights recent advances such as attention and Transformers, and shares practical experience and current challenges in the field.

Attention MechanismNeural NetworksTransformer
0 likes · 11 min read
A Brief Overview of Machine Translation: History, Neural Models, and Practical Insights
Sohu Tech Products
Sohu Tech Products
Apr 11, 2019 · Artificial Intelligence

Media Domain Named Entity Recognition: Techniques, Evolution, and Sohu’s Practical Implementation

This article reviews the challenges of media‑domain named entity recognition, outlines the evolution from rule‑based methods through traditional machine‑learning and deep‑learning models to attention‑based Transformers, and details Sohu’s practical Bi‑LSTM‑CRF system with data‑annotation strategies and performance results.

Bi-LSTMCRFDeep Learning
0 likes · 12 min read
Media Domain Named Entity Recognition: Techniques, Evolution, and Sohu’s Practical Implementation
DataFunTalk
DataFunTalk
Mar 13, 2019 · Artificial Intelligence

A Comprehensive Overview of NLP Development and Deep Learning Models

This article reviews the history of natural language processing, explains key deep‑learning models such as NNLM, Word2vec, CNN, RNN, attention mechanisms, and Transformers, and discusses their applications, future trends, and practical considerations in NLP tasks.

NLPTransformerattention
0 likes · 38 min read
A Comprehensive Overview of NLP Development and Deep Learning Models
DataFunTalk
DataFunTalk
Feb 27, 2019 · Artificial Intelligence

Human‑Interactive Machine Translation: Research, Techniques, and Productization

This article reviews the current state of machine translation, explores the challenges of ambiguity, quality, and domain specificity, and presents human‑in‑the‑loop translation techniques—including attention‑enhanced models, transformer architectures, and online learning—while discussing practical productization and deployment considerations.

AI productizationHuman-in-the-LoopOnline Learning
0 likes · 16 min read
Human‑Interactive Machine Translation: Research, Techniques, and Productization
Sohu Tech Products
Sohu Tech Products
Jan 9, 2019 · Artificial Intelligence

Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms

This article provides a comprehensive, step‑by‑step explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, and training processes, illustrated with diagrams and code snippets to aid readers new to neural machine translation.

Deep LearningNeural Machine TranslationPositional Encoding
0 likes · 16 min read
Understanding the Transformer Model: Attention, Self‑Attention, and Multi‑Head Mechanisms
Sohu Tech Products
Sohu Tech Products
Oct 10, 2018 · Artificial Intelligence

Optimizing News Recall with DDPG Reinforcement Learning and Transformer Architecture

This article explains how reinforcement learning, specifically the DDPG algorithm combined with Transformer-based networks, is applied to improve large‑scale news recall systems, detailing the business scenario, algorithm selection, model architecture, speed optimizations, training challenges, and observed online performance gains.

AIDDPGTransformer
0 likes · 13 min read
Optimizing News Recall with DDPG Reinforcement Learning and Transformer Architecture
21CTO
21CTO
Sep 15, 2018 · Backend Development

Laravel Architecture Deep Dive: Repositories, Services, Presenters, Transformers

The article summarizes a video on Laravel project structuring, explaining how separating responsibilities into layers such as Repository for data access, Service for business logic, Presenter for view preparation, Transformer for data shaping, and Formatter for consistent API responses improves maintainability and scalability.

Backend ArchitectureLaravelPresenter
0 likes · 6 min read
Laravel Architecture Deep Dive: Repositories, Services, Presenters, Transformers
Alibaba Cloud Developer
Alibaba Cloud Developer
May 11, 2018 · Artificial Intelligence

How Suffix Prediction Boosts English‑Russian Neural Machine Translation Accuracy

Researchers introduce a novel suffix‑prediction mechanism for neural machine translation that separately generates stems and suffixes during decoding, dramatically reducing out‑of‑vocabulary errors and morphological mistakes in English‑Russian translation, achieving consistent improvements across RNN and Transformer models on large‑scale news and e‑commerce datasets.

English-RussianMorphologically Rich LanguagesNeural Machine Translation
0 likes · 10 min read
How Suffix Prediction Boosts English‑Russian Neural Machine Translation Accuracy