Tagged articles
374 articles
Page 4 of 4
DataFunSummit
DataFunSummit
Jun 14, 2023 · Artificial Intelligence

DataFun Summit 2023: Large Language Models and AIGC Conference

DataFun will host the DataFun Summit 2023 on June 17‑18, featuring three chairs and eight presenters who will discuss core topics such as large language model research, multimodal generation, reinforcement learning, tool learning, distributed training, and industry applications, with free registration via QR code.

AI ConferenceAIGClarge language models
0 likes · 42 min read
DataFun Summit 2023: Large Language Models and AIGC Conference
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 12, 2023 · Artificial Intelligence

Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications

This article provides a detailed, step‑by‑step tutorial on OpenAI’s language models, API endpoints, prompt engineering, embeddings, moderation, fine‑tuning, LangChain workflows, memory management, and multimodal capabilities such as audio transcription and image generation, complete with code examples and practical usage tips.

APIEmbeddingFine-tuning
0 likes · 45 min read
Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 11, 2023 · Artificial Intelligence

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

This article provides a detailed technical review of the evolution of GPT models, the Transformer architecture, large language model training methods, emergent abilities such as in‑context learning and chain‑of‑thought, multimodal extensions, and the challenges of data, scaling, and alignment, offering a holistic view for researchers and practitioners.

AIGPTInstructGPT
0 likes · 28 min read
Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Jun 2, 2023 · Artificial Intelligence

AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models

This article shares the development of a global search platform that leverages AI technologies such as Chinese word segmentation, part‑of‑speech tagging, text similarity via Simhash and Synonyms, image similarity using histogram, Hamming distance and ResNet‑50, and multimodal CLIP‑based models to improve search efficiency and accuracy.

AINLPimage retrieval
0 likes · 12 min read
AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models
Programmer DD
Programmer DD
May 5, 2023 · Artificial Intelligence

How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot

Microsoft has fully opened Bing Chat to all users, introducing multimodal responses, a multilingual Image Creator, persistent chat history, and upcoming plugin support, while sharing usage statistics and outlining weekly update plans that position Bing as an AI‑driven search copilot competing with ChatGPT.

AIBingChatMicrosoft
0 likes · 8 min read
How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

knowledge distillationlarge language modelsmultimodal
0 likes · 23 min read
Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining
21CTO
21CTO
Apr 2, 2023 · Artificial Intelligence

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

This article reviews Microsoft’s extensive 155‑page work on early experiments with GPT‑4, exploring how the model approaches artificial general intelligence, its testing methodology, multimodal capabilities, programming and mathematical performance, interaction with tools and humans, limitations, societal impact, and future research directions.

AI SafetyArtificial General IntelligenceGPT-4
0 likes · 15 min read
Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study
DataFunTalk
DataFunTalk
Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Deep LearningGPT-4Neural Networks
0 likes · 10 min read
Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI
Programmer DD
Programmer DD
Mar 22, 2023 · Artificial Intelligence

How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive

The article reviews Baidu’s newly launched Ernie Bot, a multimodal large language model, comparing its literary, business, mathematical, Chinese comprehension, and multimodal abilities with GPT‑4, while detailing the underlying technologies, knowledge‑enhancement techniques, and deployment strategy behind the model.

AI comparisonBaiduErnie Bot
0 likes · 10 min read
How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive
Python Programming Learning Circle
Python Programming Learning Circle
Mar 18, 2023 · Artificial Intelligence

Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture

Baidu unveiled its new generative AI chatbot ERNIE Bot, showcasing five practical scenarios, multimodal generation, a detailed technical stack based on the ERNIE and PLATO models, and a comparison with ChatGPT and Bing Chat, while also announcing its invitation‑only testing program and API access for enterprises.

BaiduChatbotErnie Bot
0 likes · 12 min read
Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture
Architecture Digest
Architecture Digest
Mar 17, 2023 · Artificial Intelligence

Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction

The article reviews Baidu's launch of the multimodal large language model Wenxin Yiyan, compares its literary, business, mathematical, Chinese‑understanding and multimodal abilities with GPT‑4, explains the underlying six‑core technologies and hardware stack, and reports the mixed market and netizen response.

AIBaiduErnie Bot
0 likes · 11 min read
Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction
DataFunSummit
DataFunSummit
Mar 15, 2023 · Artificial Intelligence

Key Features and Capabilities of OpenAI's GPT‑4

OpenAI's GPT‑4, a large multimodal language model, expands token limits, adds image understanding, demonstrates strong reasoning on professional exams, supports many languages, and is already integrated into Microsoft Bing, while offering various access options and improved safety compared to its predecessor.

AIGPT-4Microsoft Bing
0 likes · 9 min read
Key Features and Capabilities of OpenAI's GPT‑4
Alimama Tech
Alimama Tech
Feb 1, 2023 · Artificial Intelligence

CapOnImage: Context-driven Dense Captioning on Images

The paper presents CapOnImage, a novel image‑on‑image captioning task that generates location‑specific decorative text for product images, introduces the 2.1‑million‑image CapOnImage2M dataset, and proposes a mixed‑modality transformer with position‑aware pre‑training and progressive training, achieving superior accuracy and diversity and already deployed in Alibaba’s advertising platforms for measurable business impact.

Context-AwareDatasetDeep Learning
0 likes · 9 min read
CapOnImage: Context-driven Dense Captioning on Images
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 4, 2023 · Artificial Intelligence

Relevance Modeling and Ranking for Cloud Music Video Search

The paper details Cloud Music’s video‑search pipeline—query understanding, recall, relevance, ranking and re‑ranking—highlighting challenges such as ambiguous content, timeliness and multi‑objective goals, and describes two deployed models (a twin‑tower aspect relevance network and a click‑graph propagator) that together boost click‑through rate by 1.5 % and effective CTR by 2.3 %.

click graphmultimodalranking
0 likes · 24 min read
Relevance Modeling and Ranking for Cloud Music Video Search
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Computer VisionDeep LearningModel architecture
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
DataFunSummit
DataFunSummit
Oct 20, 2022 · Artificial Intelligence

End-to-End Speech Relation Extraction

This paper presents an end‑to‑end approach for extracting relational triples directly from speech signals, bypassing intermediate transcription, and demonstrates its effectiveness on synthesized speech versions of the CoNLL04 and TACRED datasets, highlighting challenges such as length constraints and cross‑modal alignment.

End-to-Endmultimodalnatural language processing
0 likes · 17 min read
End-to-End Speech Relation Extraction
HelloTech
HelloTech
Oct 19, 2022 · Artificial Intelligence

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

The Intelligent Creative System defines advertising creatives across formats, evaluates image and text quality using reference‑based metrics and models like DeepBIQ, generates multimodal ads via GANs and Transformers, and selects optimal variants through bandit‑based CTR prediction and multimodal fusion, enabling scalable, data‑driven creative production.

AIBandit ModelGAN
0 likes · 10 min read
Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization
DataFunTalk
DataFunTalk
Sep 13, 2022 · Artificial Intelligence

Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

This article presents an in‑depth overview of intelligent question answering in QQ Browser search, covering its background, the core KBQA and DeepQA technologies, system architecture, challenges, recent advances such as end‑to‑end, knowledge‑guided and multimodal QA, and practical Q&A for deployment.

AIDeep LearningKnowledge Graph
0 likes · 22 min read
Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Aug 17, 2022 · Artificial Intelligence

Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches

The paper describes NetEase Cloud Music’s LOOK live‑streaming recommendation system for the song‑playback page, which combines millisecond‑level real‑time feature pipelines, multi‑target optimization (click, watch, gift, comment) via ESMM+FM and MMoE models, GradNorm‑based loss fusion, and a multimodal avatar‑text‑host ranking model, achieving double‑digit CTR and CTCVR gains while balancing producer and consumer retention.

ESMMGradNormMMoE
0 likes · 26 min read
Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI GenerationChinese NLPEasyNLP
0 likes · 20 min read
Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models
DataFunSummit
DataFunSummit
Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsInformation Extraction
0 likes · 44 min read
DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications
Alimama Tech
Alimama Tech
Jul 13, 2022 · Artificial Intelligence

Fully Automatic Template‑Free Image‑Text Creative Generation System

Alibaba Alimama’s fully automatic, template‑free image‑text creative generation system uses deep‑learning models across material mining, layout synthesis, on‑image copy generation, and visual attribute rendering to produce personalized ad creatives directly from product images and metadata, achieving roughly 19 % CTR lift over prior template‑based methods.

AIAutomationComputer Vision
0 likes · 19 min read
Fully Automatic Template‑Free Image‑Text Creative Generation System
DataFunTalk
DataFunTalk
Jul 9, 2022 · Artificial Intelligence

Education Knowledge Graph: Opportunities and Challenges

The article provides a comprehensive overview of education knowledge graphs, explaining their definition, significance, diverse application scenarios such as smart textbooks, deep reading, subject insight, and intelligent services, while also analyzing technical challenges like data heterogeneity, granularity, multimodality, quality control, and proposing future research directions.

Intelligent TutoringKnowledge Graphartificial intelligence
0 likes · 25 min read
Education Knowledge Graph: Opportunities and Challenges
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationComputer VisionDataset
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)
JD Retail Technology
JD Retail Technology
Jun 16, 2022 · Artificial Intelligence

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

The 2022 Global AI Technology Innovation Competition – Algorithm Challenge, co‑hosted by JD Retail and academic partners, brought together 12 finalist teams from over 3,000 entrants to tackle e‑commerce‑focused AI problems such as multimodal image‑text matching and product‑title entity recognition, highlighting real‑world business impact and fostering talent exchange.

AI competitionJD RetailNLP
0 likes · 8 min read
2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce
AntTech
AntTech
Jun 15, 2022 · Artificial Intelligence

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.

Vision TransformerXYCutdocument understanding
0 likes · 10 min read
XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding
DaTaobao Tech
DaTaobao Tech
May 27, 2022 · Artificial Intelligence

Multimodal Pretraining for Search Recall in E-commerce

The paper proposes a multimodal pre‑training framework that jointly encodes query text and item titles with images via shared and single‑stream towers, using MLM, MPM, QIC, and matching tasks, and demonstrates substantial Recall@K gains on a billion‑item e‑commerce catalog by leveraging visual cues to bridge the semantic gap.

Vector Retrievale‑commercemultimodal
0 likes · 17 min read
Multimodal Pretraining for Search Recall in E-commerce
DataFunTalk
DataFunTalk
May 20, 2022 · Artificial Intelligence

Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling

This article presents a multimodal approach that combines dynamic analysis and graph machine learning to generate and apply social relationship graphs in videos, detailing problem background, graph generation modules, applications such as video retrieval, experimental results, and future research directions.

AIGraph Neural NetworkWeak Supervision
0 likes · 11 min read
Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling
Laiye Technology Team
Laiye Technology Team
May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Document AILayout Analysisgraph neural networks
0 likes · 15 min read
Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc
Bilibili Tech
Bilibili Tech
May 10, 2022 · Artificial Intelligence

Glance Supervised Video Moment Retrieval via the ViGA Framework

The paper presents a glance‑supervised video moment retrieval approach that records a single annotator‑seen frame, introduces the ViGA contrastive learning framework to leverage this weak temporal cue, and demonstrates on three benchmarks performance rivaling fully supervised methods while keeping annotation cost minimal.

Computer VisionGlance SupervisionViGA
0 likes · 8 min read
Glance Supervised Video Moment Retrieval via the ViGA Framework
Tencent Tech
Tencent Tech
Apr 21, 2022 · Artificial Intelligence

How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks

Tencent’s newly unveiled HunYuan AI model achieved a grand‑slam by ranking first on the five most authoritative cross‑modal video retrieval datasets, showcasing a hierarchical multimodal approach that dramatically boosts retrieval precision and promises broad impact for both research and industry applications.

AITencentmultimodal
0 likes · 5 min read
How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks
DaTaobao Tech
DaTaobao Tech
Apr 6, 2022 · Artificial Intelligence

Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling

The paper tackles Taobao Live’s cold‑start problem for new users by introducing a multi‑channel lifelong product‑sequence network that enriches purchase histories with side information, extracts relevance‑focused subsequences across five channels, and integrates them via target‑attention DIN, achieving substantial offline and online performance gains, especially for low‑activity users.

Recommendation Systemscold starte‑commerce
0 likes · 23 min read
Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling
DataFunTalk
DataFunTalk
Mar 28, 2022 · Artificial Intelligence

Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph

This article introduces Meituan's on‑site comprehensive knowledge graph, detailing its multi‑layer design, data‑driven construction pipeline, challenges of diverse user demands and industry complexity, and showcases practical applications in search, recommendation, intelligent display, as well as future expansion plans.

Knowledge GraphMeituanlocal services
0 likes · 22 min read
Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph
DataFunTalk
DataFunTalk
Jan 22, 2022 · Artificial Intelligence

Multimodal Content Understanding Techniques in Search Systems

This talk presents Tencent's multimodal content understanding framework for search, covering hierarchical content features, large‑scale ranking, fine‑grained image semantic vectors, video and document analysis, quality detection, duplicate removal, and future directions in AI‑driven search.

AIImage EmbeddingSearch
0 likes · 17 min read
Multimodal Content Understanding Techniques in Search Systems
DataFunTalk
DataFunTalk
Dec 26, 2021 · Artificial Intelligence

Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges

This talk reviews recent progress in neural‑symbolic learning and multimodal knowledge discovery, highlighting examples such as GPT‑3 reasoning failures, the need for symbolic knowledge, historical developments, various integration methods, challenges in multimodal knowledge graphs, and future research directions.

AIKnowledge Graphmachine learning
0 likes · 20 min read
Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 6, 2021 · Artificial Intelligence

Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator

Alibaba’s DAMO Academy and Tsinghua University introduced M6‑UFC, a non‑autoregressive multimodal transformer that unifies arbitrary text and image controls to generate high‑quality, editable fashion designs, dramatically reducing carbon emissions and outperforming GAN‑based models in fidelity and relevance while accelerating production speed.

AIM6-UFCfashion design
0 likes · 11 min read
Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator
DataFunSummit
DataFunSummit
Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRSLUSpeech AI
0 likes · 22 min read
Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation
AntTech
AntTech
Oct 29, 2021 · Artificial Intelligence

Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)

The Ant Insurance Technology team, together with the Institute of Automation of the Chinese Academy of Sciences, secured first place in both the MuSe‑Wilder and MuSe‑Sent tracks of the MuSe2021 Multimodal Sentiment Challenge held at the 29th ACM International Conference on Multimedia in Chengdu, showcasing advanced multimodal AI techniques.

BiLSTMDeep LearningMuSe2021
0 likes · 4 min read
Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)
DataFunTalk
DataFunTalk
Sep 30, 2021 · Artificial Intelligence

Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team

This article presents Alibaba's AliMe team's year‑long progress on knowledge graph research, covering the fundamentals of knowledge graphs, domain and multimodal graph construction techniques, practical e‑commerce applications such as dialogue‑driven recommendation, virtual‑anchor script generation, and insights on future directions.

AIKnowledge Graphdialogue system
0 likes · 23 min read
Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team
DataFunSummit
DataFunSummit
Sep 26, 2021 · Artificial Intelligence

Contrastive Learning and Its Applications in Weibo Content Representation

This article explains the fundamentals of contrastive learning, reviews typical models such as SimCLR, MoCo, SwAV, BYOL, SimSiam and Barlow Twins, and demonstrates how these methods are applied to Weibo text and multimodal (text‑image) representation tasks like hashtag generation and image‑text matching.

NLPWeibocontrastive learning
0 likes · 18 min read
Contrastive Learning and Its Applications in Weibo Content Representation
Meituan Technology Team
Meituan Technology Team
Sep 2, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

The paper describes Meituan’s retail product knowledge graph—a multi‑layered, multi‑modal system that structures billions of SKUs, attributes, and user insights using hierarchical categories, graph‑enhanced NER, semi‑supervised learning, and expert‑in‑the‑loop validation, enabling precise search, ranking, recommendation, and real‑time merchant optimization.

AIKnowledge GraphRetail
0 likes · 25 min read
Construction and Application of Retail Product Knowledge Graph at Meituan
DataFunTalk
DataFunTalk
Aug 30, 2021 · Artificial Intelligence

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

This article explains the concept of contrastive learning, its relationship to self‑supervised and metric learning, describes key system components and loss functions, reviews major image, NLP and multimodal models such as SimCLR, MoCo, SwAV, BYOL, and demonstrates how contrastive learning is applied to Weibo hashtag generation, similar‑post retrieval, and text‑image matching using CD‑TOM and W‑CLIP models.

AIWeibocontrastive learning
0 likes · 19 min read
Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation
Tencent Advertising Technology
Tencent Advertising Technology
Aug 18, 2021 · Artificial Intelligence

2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback

The 2021 Tencent Advertising Algorithm Competition, held as the ACM MM 2021 Grand Challenge, announced the top three teams for two tracks, presented the accepted multimodal video advertising papers with detailed reviewer comments, and highlighted the significance of algorithmic innovation over ranking alone.

ACM MMAIAdvertising
0 likes · 8 min read
2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback
DataFunTalk
DataFunTalk
Jul 12, 2021 · Artificial Intelligence

Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design

This article presents an in‑depth overview of Tencent Music's live‑streaming recommendation system, covering business background, system architecture, recall and ranking model designs, multi‑modal extensions, and advanced training techniques such as DSSM, ESMM, GradNorm, and CGC to improve user engagement and conversion.

AIDSSMTencent Music
0 likes · 13 min read
Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design
DataFunTalk
DataFunTalk
Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchefficient traininglarge language models
0 likes · 27 min read
Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey
Xianyu Technology
Xianyu Technology
Jul 1, 2021 · Artificial Intelligence

Improving Search Relevance in Xianyu: System Design and Model Implementation

The paper describes Xianyu’s new relevance‑matching pipeline—integrating basic, text‑matching, semantic (BERT‑based dual‑tower), multimodal, and click‑graph features and fusing them with a GBDT model—which boosts search DCG@10 by 6.5 %, query satisfaction by 24 % and click interaction by over 20 % while outlining future enhancements for finer attribute matching and richer structured data.

e‑commercefeature engineeringmachine learning
0 likes · 13 min read
Improving Search Relevance in Xianyu: System Design and Model Implementation
Tencent Advertising Technology
Tencent Advertising Technology
May 28, 2021 · Artificial Intelligence

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

The article shares a Tencent competition champion’s practical TensorFlow‑based video ad solution, detailing data handling, model architecture, optimization tricks, multimodal fusion techniques, and experimental observations to help participants improve performance in the 2021 Tencent Advertising Algorithm Contest.

TensorFlowadvertising algorithmcompetition
0 likes · 7 min read
Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies
DataFunTalk
DataFunTalk
Feb 25, 2021 · Artificial Intelligence

Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community

This article describes how a UGC app tackled user and content cold‑start problems by introducing a personalized vector‑recall pipeline based on network representation learning and multimodal embeddings, detailing graph construction, GraphSAGE and GAT implementations, offline experiments, A/B test results, and future directions.

GNNgraph-embeddingmultimodal
0 likes · 14 min read
Applying Graph Embedding and Vector Recall for Personalized Recommendation in a UGC Community
JD Cloud Developers
JD Cloud Developers
Feb 5, 2021 · Artificial Intelligence

2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist

In an InfoQ interview, JD Technology senior algorithm scientist Wu Youzheng reviews the rapid advances of natural language processing in 2020—including GPT‑3, multimodal dialogue, knowledge‑enhanced pre‑training, and knowledge graphs—while outlining the most promising research directions and practical challenges for the coming year.

AI applicationsGPT-3Knowledge Graph
0 likes · 18 min read
2020 NLP Milestones & Future Trends: Insights from JD’s AI Scientist
DataFunTalk
DataFunTalk
Nov 16, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment

The presentation by Alibaba Entertainment's senior algorithm expert details the challenges of video search in the 4G/5G era and describes a comprehensive framework covering business overview, relevance and ranking, multimodal retrieval, deep semantic modeling, dataset construction, and practical deployment techniques.

Deep Learninginformation retrievalmultimodal
0 likes · 27 min read
Deep Semantic Relevance and Multimodal Video Search at Alibaba Entertainment
JD Cloud Developers
JD Cloud Developers
Nov 4, 2020 · Artificial Intelligence

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

The article recaps the inaugural Multimodal Natural Language Processing workshop at NLPCC 2020, highlighting breakthroughs in multimodal summarization, pre‑training models, AI‑driven art, visual‑language interaction, and multimodal dialogue systems, and showcases research from leading institutions and industry partners.

AINLPdialogue
0 likes · 9 min read
Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop
DataFunSummit
DataFunSummit
Nov 3, 2020 · Artificial Intelligence

Deep Semantic Relevance and Multi‑Modal Video Search at Alibaba Entertainment

This presentation details Alibaba Entertainment's video search system, covering its business scope, user‑value metrics, a layered algorithm framework, relevance challenges, multi‑modal retrieval, deep semantic relevance techniques, model selection, asymmetric twin‑tower deployment, multi‑stage knowledge distillation, and practical effect cases.

AlibabaSearch Algorithmsmultimodal
0 likes · 25 min read
Deep Semantic Relevance and Multi‑Modal Video Search at Alibaba Entertainment
DataFunTalk
DataFunTalk
Jul 8, 2020 · Artificial Intelligence

Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku

The article presents a detailed technical overview of Youku's video search system, covering multi‑modal inputs, multi‑level element indexing, face search, cross‑level and cross‑modal retrieval, and the design and applications of a multimodal graph engine with knowledge‑graph integration.

AIKnowledge Graphface search
0 likes · 12 min read
Multi‑Level Multi‑Modal Search Engine and Graph Engine for Video Content at Youku
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 2, 2020 · Artificial Intelligence

How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings

This article introduces FashionBERT, a multimodal BERT‑based model that replaces ROI‑based image tokens with uniform image patches to overcome e‑commerce specific challenges, details its architecture, adaptive loss balancing, deployment in Alibaba search, and reports significant performance gains on public and internal datasets.

BERTDeep Learninge‑commerce
0 likes · 13 min read
How FashionBERT Boosts E‑Commerce Image‑Text Matching with Patch Embeddings
Didi Tech
Didi Tech
May 25, 2020 · Artificial Intelligence

How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models

This article provides a comprehensive technical overview of modern speech recognition, covering Didi’s driver‑assistant and smart‑customer‑service applications, fundamental ASR concepts, classic GMM‑HMM methods, deep‑learning breakthroughs such as DNN‑HMM, CTC, attention‑based and transformer models, practical training tricks, signal‑processing steps, and multimodal fusion techniques.

ASRCTCDeep Learning
0 likes · 16 min read
How Didi Harnesses Cutting‑Edge Speech Recognition: From ASR Basics to Transformer Models
DataFunTalk
DataFunTalk
May 23, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search

Based on academic and industry experience, iQIYI has designed a deep semantic representation learning framework that integrates multimodal side information and deep models such as Transformers and graph neural networks, improving recall, ranking, deduplication, diversity and semantic matching across recommendation and search scenarios.

Deep LearningRecommendation SystemsSearch
0 likes · 27 min read
iQIYI Deep Semantic Representation Learning Framework for Video Recommendation and Search
iQIYI Technical Product Team
iQIYI Technical Product Team
May 15, 2020 · Artificial Intelligence

iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications

iQIYI’s deep semantic representation learning framework integrates multimodal content, knowledge graphs, and user behavior through layered data, feature, strategy, and application components, employing early, late, and hybrid fusion with Transformers, GCNs, and other deep models to deliver high‑quality embeddings that boost recommendation, search, and streaming performance across dozens of business scenarios.

Searchgraph neural networksiQIYI
0 likes · 28 min read
iQIYI Deep Semantic Representation Learning Framework: Design, Challenges, and Applications
DataFunTalk
DataFunTalk
Apr 20, 2020 · Artificial Intelligence

Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques

This article presents a comprehensive overview of Youku's video search system, covering business background, evaluation metrics, system and algorithm frameworks, relevance and ranking feature engineering, dataset construction, semantic matching, multimodal video understanding, and practical case studies that illustrate the impact of deep learning and AI techniques on search performance.

AIDeep Learningmultimodal
0 likes · 18 min read
Video Search at Youku: Algorithmic Practices, Relevance, Ranking, and Multimodal Techniques
Youku Technology
Youku Technology
Apr 16, 2020 · Artificial Intelligence

Multimodal Video Classification: Image Feature Improvements and System Insights

The talk presents Alibaba’s hierarchical video‑category system and a multimodal classification pipeline—leveraging EfficientNet, NeXtVLAD fusion, attention‑dropping augmentation, and MoCo contrastive learning—that together boost cold‑start recall by 43%, improve program classification over 20%, and set the stage for larger models and advanced unsupervised methods.

AIEfficientNetUnsupervised Learning
0 likes · 17 min read
Multimodal Video Classification: Image Feature Improvements and System Insights
DataFunTalk
DataFunTalk
Mar 30, 2020 · Artificial Intelligence

Enhancing Multimodal Video Classification with Improved Image Features and Category System

This article presents a comprehensive overview of Alibaba Entertainment's category system and multimodal video classification algorithm, detailing the construction of a high‑accuracy hierarchical taxonomy, improvements to image feature extraction using EfficientNet and data augmentation, unsupervised training techniques, experimental results, practical pitfalls, and future research directions.

AIUnsupervised Learningcategory system
0 likes · 17 min read
Enhancing Multimodal Video Classification with Improved Image Features and Category System
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 27, 2020 · Artificial Intelligence

Multimodal Short Video Content Tagging Techniques and Applications at iQIYI

The article surveys iQIYI’s multimodal short‑video content‑tagging pipeline, detailing extraction‑ and generation‑based methods, challenges of open‑world tags, model evolution from rule‑based to Transformer generators, visual‑text fusion techniques, and applications such as recommendation, search, clustering, and future enhancements.

NLPcontent taggingiQIYI
0 likes · 18 min read
Multimodal Short Video Content Tagging Techniques and Applications at iQIYI
DataFunTalk
DataFunTalk
Feb 3, 2020 · Artificial Intelligence

Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku

The live session presented Alibaba Entertainment’s senior algorithm expert discussing Youku’s video search business, relevance and ranking models, multimodal search challenges, and practical AI techniques, offering attendees a comprehensive view of modern video retrieval systems and their implementation.

AISearch Algorithmsinformation retrieval
0 likes · 3 min read
Alibaba Entertainment Search Algorithm Practice and Insights – Video Search Case Study with Youku
DataFunTalk
DataFunTalk
Nov 20, 2019 · Artificial Intelligence

Advances and Reflections on Human‑Machine Dialogue Technologies

This presentation reviews recent progress in spoken and multimodal dialogue systems, covering X‑driven architectures, task‑oriented and open‑domain approaches, NLU/DM integration, FAQ, KB/KG‑driven methods, document‑driven dialogue, and outlines remaining challenges and future research directions.

Dialogue SystemsKnowledge Graphartificial intelligence
0 likes · 21 min read
Advances and Reflections on Human‑Machine Dialogue Technologies
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 27, 2019 · Artificial Intelligence

iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition

iQIYI-VID is the world’s largest multimodal video dataset for person recognition, containing 10,000 celebrity identities and 600,000 video clips drawn from millions of videos, supporting tasks such as detection, identification, attribute and audio analysis, and serving as the basis for 2018‑2019 challenges and a face‑recognition subset, thereby driving research while performance gaps remain.

AIiQIYI-VIDmultimodal
0 likes · 7 min read
iQIYI-VID: A Large-Scale Multimodal Video Dataset for Person Recognition
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 9, 2019 · Artificial Intelligence

iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team

The Zheey team from Beijing University of Posts and Telecommunications tackled the iQIYI 2019 Multimodal Video Person Recognition Challenge with a three‑layer MLP on official face features, boosting a baseline 0.8742 to 0.8949 through model fusion, quality filtering and fine‑tuning, ultimately ranking sixth and open‑sourcing their code.

MLPcompetitionface features
0 likes · 9 min read
iQIYI 2019 Multimodal Video Person Recognition Competition Report by Zheey Team
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 6, 2019 · Artificial Intelligence

Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos

iQIYI’s large‑scale hierarchical classification system combines multimodal text and image embeddings, low‑rank multimodal fusion, and a dense hierarchical multilabel network with cascade‑style weighting to assign accurate type tags to short videos, boosting production efficiency and personalized recommendation diversity.

AIHierarchical Classificationfeature fusion
0 likes · 16 min read
Large-Scale Hierarchical Classification Algorithm for iQIYI Short Videos
iQIYI Technical Product Team
iQIYI Technical Product Team
Apr 4, 2019 · Artificial Intelligence

My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge

In the iQIYI Multimodal Person Recognition Challenge, I leveraged the provided facial features, weighted face‑quality averaging, DBSCAN‑based noise clustering and a dynamic extra noise class within an iterative KNN‑to‑neural‑network training pipeline, ultimately reaching the top‑5 and open‑sourcing the full workflow on GitHub.

DBSCANiQIYImultimodal
0 likes · 7 min read
My Experience and Methods in the iQIYI Multimodal Person Recognition Challenge
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 22, 2019 · Artificial Intelligence

Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)

The WitcheR team won the 2018 iQIYI multimodal video person identification challenge by building a fast pipeline that combined a custom face‑and‑keypoint detector, ArcFace‑trained face embeddings, scene classification, and a three‑layer MLP with several training tricks, achieving a final mAP of 88.6 % and demonstrating the value of rapid idea validation and open‑sourced code for future challenges.

MLPModel Fusioncompetition
0 likes · 12 min read
Experience Report of the 2018 iQIYI Multimodal Video Person Identification Challenge (WitcheR Team)
DataFunTalk
DataFunTalk
Mar 15, 2019 · Artificial Intelligence

Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots

The article explores how chatbots require personalized dense knowledge graphs, dynamic temporal graphs, subjective emotion modeling, integration with external services, and multimodal media support, while also promoting a new NLP book and a related giveaway for readers.

AIChatbotDynamic Graph
0 likes · 9 min read
Designing Personalized, Dynamic, and Multimodal Knowledge Graphs for Chatbots