Tagged articles

Multimodal

422 articles · Page 4 of 5

Aug 13, 2024 · Artificial Intelligence

Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI

From August 11‑16, 2024 the ACL conference in Bangkok featured 14 Ant Group papers covering large‑scale information extraction, decomposed LLMs for semantic search, multimodal hallucination detection, long‑context attention mechanisms, concept‑reasoning datasets, knowledge‑graph alignment, and more, highlighting the group's breadth in natural language processing and AI research.

ACL2024Large Language ModelsMultimodal

0 likes · 20 min read

Ant Group Contributions to ACL 2024: Summaries of 14 Accepted Papers Across NLP and AI

DataFunSummit

Jul 28, 2024 · Artificial Intelligence

Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions

This article reviews why large language models can be applied to graph learning, outlines their capabilities and graph data characteristics, surveys current research across different graph types and LLM roles, and proposes future research directions for unified cross‑domain graph learning.

AIGraph Neural NetworksLarge Language Models

0 likes · 19 min read

Leveraging Large Language Models for Graph Learning: Opportunities, Current Progress, and Future Directions

Tencent Cloud Developer

Jul 18, 2024 · Artificial Intelligence

Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions

Exploring Large Language Models, this article surveys their core concepts, evolution through Transformers, GPT and BERT, generation challenges, diverse applications such as QA, multimodal creation, summarization and retrieval‑augmented generation, prompt‑engineering frameworks and tools, LangChain‑based pipelines, AI‑driven agents, and future prospects toward domain‑specific use, multimodality, and AGI.

AIAgentLLM

0 likes · 35 min read

Exploring Large Language Models (LLM): Fundamentals, Applications, and Future Directions

Architects' Tech Alliance

Jul 10, 2024 · Industry Insights

Why AI Large Models Are Driving the Next Industrial Revolution

The article analyzes the rapid evolution of AI large models—from their role in advancing AGI through massive pre‑training and fine‑tuning, to current market dynamics led by GPT and domestic Chinese players, and finally to future multimodal applications, content‑factory capabilities, and emerging AIGC revenue models projected to reach trillion‑yuan scales by 2030.

AIAIGCGPT

0 likes · 7 min read

Why AI Large Models Are Driving the Next Industrial Revolution

Baobao Algorithm Notes

Jul 8, 2024 · Industry Insights

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

The article analyzes current challenges in deploying large AI models, covering robot automation, scaling‑law limits, vertical‑domain use cases, multimodal breakthroughs, algorithmic evolution, and the hardware‑software trade‑offs of training and inference infrastructures, while questioning ROI and practical feasibility.

Multimodalalgorithm evolutioninference infrastructure

0 likes · 21 min read

Why Large‑Model Deployment Stalls: Robots, Scaling Laws, and Multimodal Frontiers

Baobao Algorithm Notes

Jul 4, 2024 · Artificial Intelligence

Vitron: How a Pixel‑Level Multimodal LLM Bridges Vision and Language

Vitron is a unified pixel‑level visual multimodal large language model that integrates image, video, and region encoders with a text‑centric strategy, delivering precise pixel‑wise perception and a comprehensive suite of vision tasks from understanding to generation and editing.

AILLMMultimodal

0 likes · 12 min read

Vitron: How a Pixel‑Level Multimodal LLM Bridges Vision and Language

AI Large Model Application Practice

Jul 4, 2024 · Artificial Intelligence

Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting

This article explains how to handle complex multimodal PDFs in RAG systems, outlines extraction, indexing, and multimodal model integration, details four query‑rewriting strategies (HyDE, stepwise, sub‑question, backward), and presents key evaluation metrics and tools for assessing RAG performance.

Document ParsingEvaluationMultimodal

0 likes · 12 min read

Mastering Multimodal RAG: From PDF Parsing to Advanced Query Rewriting

360 Tech Engineering

Jul 3, 2024 · Artificial Intelligence

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

The 360LayoutAnalysis project from 360 AI Lab releases lightweight, yolov8‑based layout analysis models covering Chinese and English papers, Chinese research reports, and a general document scenario, providing fast inference, paragraph‑level detection, and open‑source code and weights for flexible document‑understanding pipelines.

AI modelLayout AnalysisMultimodal

0 likes · 9 min read

360LayoutAnalysis: Open‑Source Lightweight Document Layout Analysis Models for Multiple Scenarios

JD Tech

Jun 28, 2024 · Artificial Intelligence

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

This article provides a comprehensive introduction to large language models, covering their historical development, core architecture, training process, prompt engineering techniques, Retrieval‑Augmented Generation, agent frameworks, multimodal capabilities, safety challenges, and future research directions.

AI agentsAI safetyDeep Learning

0 likes · 22 min read

An Overview of Large Language Models: History, Fundamentals, Prompt Engineering, Retrieval‑Augmented Generation, Agents, and Multimodal AI

AntTech

Jun 18, 2024 · Artificial Intelligence

Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts

The IEEE CVPR2024 conference in Seattle accepted 2,719 papers out of 11,532 submissions, and Ant Group contributed 24 papers covering computer vision, deep learning, digital humans, large models, multimodal remote sensing, vision‑language distillation, federated incremental learning, model‑stealing defense, and more, with one highlighted as a highlight.

Ant GroupCVPR2024Deep Learning

0 likes · 17 min read

Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts

NewBeeNLP

Jun 18, 2024 · Artificial Intelligence

How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models

This article presents Shopee's comprehensive approach to constructing an e‑commerce knowledge graph, detailing the challenges of heterogeneous data, multi‑language handling, entity disambiguation, and the integration of deep learning and large language models to improve product matching, recommendation, and operational efficiency.

AIMultimodale-commerce

0 likes · 22 min read

How Shopee Builds an E‑Commerce Knowledge Graph and Leverages Large Models

DataFunTalk

Jun 14, 2024 · Artificial Intelligence

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

This article presents Shopee's comprehensive exploration of building an e‑commerce knowledge graph, detailing its challenges, construction pipeline, AI‑driven extraction and fusion techniques, multilingual and multimodal modeling, and practical applications ranging from search and recommendation to AI assistants and real‑time updates.

AI ApplicationsLarge Language ModelsMultimodal

0 likes · 21 min read

Shopee's E‑commerce Knowledge Graph Construction and Integration with Large Models

Alibaba Cloud Developer

Jun 13, 2024 · Artificial Intelligence

Creating a Full AI‑Generated Music Video with Large‑Model Agents

This article documents the end‑to‑end workflow of using large multimodal models and specialized agents to automatically generate a storyboard, compose original music and lyrics, produce keyframes, and assemble a complete music video, while highlighting the remaining manual steps and future automation possibilities.

AIAgentsMultimodal

0 likes · 10 min read

Creating a Full AI‑Generated Music Video with Large‑Model Agents

Baobao Algorithm Notes

Jun 5, 2024 · Artificial Intelligence

Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage

This article reviews the open‑source 9‑billion‑parameter GLM‑4‑9B model, covering installation, quick‑start inference code, quirky Chinese riddles that highlight its strengths over GPT‑4, extensive benchmark tables for dialogue, multilingual, tool‑calling and multimodal tasks, and its broader impact on the Chinese AI ecosystem.

AIGLM-4-9BMultimodal

0 likes · 14 min read

Is GLM‑4‑9B the New Powerhouse? A Deep Dive into Its Performance and Usage

DataFunSummit

Jun 4, 2024 · Artificial Intelligence

Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

This article details eBay's practical experience integrating multimodal data and graph neural networks into its recommendation pipeline, covering pain‑point analysis, a twin‑tower multimodal embedding model with triplet loss and TransH, engineering design, experimental results, and key takeaways for future AI‑driven product development.

EmbeddingGNNGraph Neural Network

0 likes · 19 min read

Multimodal and Graph Neural Network Techniques for eBay Recommendation Systems

Alimama Tech

May 29, 2024 · Artificial Intelligence

Mixture of Multi‑Modal Experts for Advertising Recall

The Mixed‑Modal Expert Model combines ID features with image and text embeddings through optimized representations and conditional output fusion, dramatically improving advertising recall—especially for long‑tail items—and delivering measurable gains in click‑recall, revenue, CTR, and page views in large‑scale online tests.

ModelMultimodalmachine learning

0 likes · 15 min read

Mixture of Multi‑Modal Experts for Advertising Recall

NewBeeNLP

May 28, 2024 · Artificial Intelligence

How Generative Models Are Redefining Recommendation Systems

This article reviews recent advances in generative recommendation, highlighting challenges such as item representation and multimodal fusion, and summarizing four key research papers that propose novel tokenization, collaborative integration, and transformer-based multimodal approaches to improve recommendation performance.

AI researchLLMMultimodal

0 likes · 8 min read

How Generative Models Are Redefining Recommendation Systems

DataFunTalk

May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCMultimodalOPPO

0 likes · 18 min read

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

21CTO

May 18, 2024 · Artificial Intelligence

What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?

This article examines OpenAI's GPT‑4o, outlining its key performance, speed, accuracy, latency, multimodal, and resource‑efficiency improvements over GPT‑4, and explains why these enhancements broaden the model's applicability across various AI‑driven applications.

AI modelGPT-4oMultimodal

0 likes · 6 min read

What Makes GPT‑4o Faster, Smarter, and More Multimodal Than GPT‑4?

360 Tech Engineering

May 17, 2024 · Artificial Intelligence

360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B

The article introduces 360VL, an open‑source multimodal large language model built on Llama‑3‑70B, describes its novel C‑abs bridge architecture for high‑resolution visual understanding, outlines the two‑stage training with bilingual data, and presents benchmark results showing superior performance over prior LMMs.

AI researchLlama3Multimodal

0 likes · 8 min read

360VL: An Open‑Source Multimodal Large Language Model Based on Llama‑3‑70B

CSS Magic

May 14, 2024 · Artificial Intelligence

First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits

The article provides a hands‑on review of OpenAI's newly released GPT‑4o model, covering its multimodal capabilities, real‑time voice demo, desktop client rollout, access options for paid and free users, practical usage tips, and early observations on API performance and limitations.

AI modelAPIChatGPT

0 likes · 9 min read

First Look at GPT-4o: Hands‑On Experience, FAQs, and New Free‑User Benefits

DataFunSummit

Apr 24, 2024 · Artificial Intelligence

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

This article presents Baidu's exploration of multimodal content understanding for commercial advertising, detailing the ViCAN pre‑training model, its contrastive and mask‑language learning tasks, integration across recall, ranking and risk‑control pipelines, quantization with MMDict, and future AIGC‑driven generation, all backed by extensive experiments and Q&A.

AIAIGCAdvertising

0 likes · 27 min read

Multimodal Content Understanding in Baidu Commercial Systems: The ViCAN Model and Its Applications

21CTO

Apr 20, 2024 · Artificial Intelligence

What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model

Meta’s newly open‑source Llama 3 model pushes the frontier of large language models with a larger context window, Mixture‑of‑Experts architecture, multilingual support, and multimodal capabilities, while facing challenges in transparency, bias, and computational resources, and offering diverse applications from NLU to code generation.

AIBenchmarkLlama3

0 likes · 10 min read

What Developers Need to Know About Meta’s New Open‑Source Llama 3 Model

Architects' Tech Alliance

Apr 7, 2024 · Artificial Intelligence

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

Sora, the newly announced text‑to‑video large model, can generate one‑minute high‑fidelity videos from textual prompts or static images, handling complex scenes, expressive characters, and sophisticated camera motions while also supporting video extension and frame‑filling, positioning it at the forefront of multimodal AI research.

AI modelMultimodalSora

0 likes · 6 min read

How Sora Is Redefining Text‑to‑Video Generation: Inside the New AI Model

DataFunSummit

Mar 27, 2024 · Artificial Intelligence

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

This article reviews Tongyi Lab's work on the OFA framework for generative multimodal pretraining and the ONE-PEACE model for unified multimodal representation learning, detailing their architectures, training strategies, experimental results across vision‑language and audio tasks, and future research directions.

MultimodalOFAONE-PEACE

0 likes · 15 min read

Generative Multimodal Pretraining (OFA) and Representational Multimodal Pretraining (ONE-PEACE): Research Overview and Findings

Alibaba Cloud Big Data AI Platform

Mar 18, 2024 · Artificial Intelligence

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

The paper presents MuLTI, a multimodal video‑language model that tackles the memory and efficiency challenges of long video‑text sequences by introducing a Text‑Guided MultiWay Sampler and a Multiple Choice Modeling pre‑training task, achieving state‑of‑the‑art results on video QA and retrieval while drastically reducing GPU memory consumption.

Multimodalefficient-aifeature fusion

0 likes · 19 min read

How MuLTI Achieves Memory‑Efficient Video‑Language Understanding with Text‑Guided MultiWay Sampling

NewBeeNLP

Feb 27, 2024 · Artificial Intelligence

Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

The article details how JD.com leverages domain‑specific and generic knowledge graphs to enhance multimodal product information, improve controlled text generation, and boost LLM performance for e‑commerce copywriting, covering model architecture, copy‑only mechanisms, token‑type encoding, experimental results, and practical deployment scenarios.

AIGCLLMMultimodal

0 likes · 23 min read

Boosting E‑Commerce AIGC with Knowledge Graphs: From Multimodal Inputs to Controlled LLMs

21CTO

Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchDiffusion ModelsMultimodal

0 likes · 19 min read

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

Java Tech Enthusiast

Feb 16, 2024 · Artificial Intelligence

Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities

Google’s Gemini 1.5, a new multimodal Mixture‑of‑Experts model, supports up to a million‑token context (10 million internally), can understand text, video, audio and code, learns a new language from a single prompt, and is already being used by Samsung, Jasper and Quora, positioning it as a direct challenger to OpenAI’s flagship models.

Gemini 1.5Google AILLM

0 likes · 7 min read

Google's Gemini 1.5: Breakthrough in Long-Context Understanding and Multimodal Capabilities

Baobao Algorithm Notes

Feb 4, 2024 · Industry Insights

Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents

In this talk, the speaker examines the dual goals of AI agents—being entertaining and useful—while introducing the concepts of fast and slow thinking, multimodal perception, long‑term memory, retrieval‑augmented generation, and tool integration as essential steps toward building truly valuable digital companions.

AI agentsFuture AIMultimodal

0 likes · 18 min read

Balancing Fun, Utility, and Slow Thinking: The Future of AI Agents

DataFunSummit

Jan 10, 2024 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding and AIGC innovations, detailing rich‑media multimodal perception, a unified large‑scale representation framework, scenario‑specific fine‑tuning, and practical applications such as marketing copy, digital‑human video, and poster generation.

AIGCAdvertisingBaidu

0 likes · 12 min read

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

NetEase Smart Enterprise Tech+

Jan 4, 2024 · Artificial Intelligence

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

The article examines the security challenges introduced by large‑model AIGC, outlines three technical upgrade paths—richer training data, few‑shot model fine‑tuning, and multimodal fusion—and demonstrates practical implementations that dramatically improve detection efficiency, accuracy, and scalability.

AI securityAIGCContent Safety

0 likes · 24 min read

How to Strengthen AIGC Content Safety with Multimodal Data and Model Upgrades

Rare Earth Juejin Tech Community

Dec 22, 2023 · Artificial Intelligence

Machine Learning-Based Text‑Image Correlation Analysis

This article introduces a machine‑learning approach for correlating text and image data, covering preprocessing, feature extraction, model training, experimental results, and future directions, and provides complete Python code examples using NLP and deep‑learning libraries.

Multimodalmachine learningtext-image correlation

0 likes · 17 min read

Machine Learning-Based Text‑Image Correlation Analysis

Rare Earth Juejin Tech Community

Dec 9, 2023 · Artificial Intelligence

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

Google announced Gemini, a suite of multimodal large language models—including Ultra, Pro, and Nano—that achieve state‑of‑the‑art results on dozens of benchmarks, support native multimodal pre‑training, and are being integrated across Google products such as Bard, Search, and upcoming Pixel devices.

Artificial IntelligenceBenchmarkGemini

0 likes · 7 min read

Google Unveils Gemini: A New Multimodal Large Model Family (Ultra, Pro, Nano)

DataFunSummit

Dec 8, 2023 · Artificial Intelligence

Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

This article presents NetEase Cloud Music's multimodal cold‑start solution, detailing the problem background, feature selection using CLIP, two modeling approaches (I2I2U indirect and U2I DSSM direct), contrastive learning enhancements, interest‑boundary modeling, and evaluation results showing significant gains in user engagement.

AIMultimodalcold-start

0 likes · 15 min read

Multimodal Cold‑Start Techniques for Music Recommendation at NetEase Cloud Music

360 Smart Cloud

Nov 20, 2023 · Artificial Intelligence

Overview of Recent Open‑Source AI Models and Tools (November 2023)

This article summarizes a collection of newly released open‑source AI projects covering natural‑language processing, multimodal processing, intelligent agents, recommendation systems, and model training acceleration, providing brief descriptions, key capabilities, and links to their repositories.

AILarge Language ModelsMultimodal

0 likes · 9 min read

Overview of Recent Open‑Source AI Models and Tools (November 2023)

php Courses

Nov 10, 2023 · Artificial Intelligence

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

OpenAI revealed a new data partnership initiative to collect large‑scale public and private datasets across multiple modalities, aiming to improve AI model safety and usefulness by incorporating diverse, hard‑to‑access human‑generated content while respecting privacy and intent.

AI training dataData PartnershipMultimodal

0 likes · 3 min read

OpenAI Announces Data Partnership Program for Public and Private Training Datasets

DataFunTalk

Nov 2, 2023 · Artificial Intelligence

Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS

This article reviews recent research on augmenting language and multimodal models with external knowledge sources and tool‑calling mechanisms, covering three systems—OREO‑LM for knowledge‑graph reasoning, REVEAL for multi‑source visual‑language pretraining, and AVIS for dynamic tool selection—and their experimental results and implications.

Language ModelMultimodalTool Integration

0 likes · 28 min read

Enhancing Language and Vision Models with External Knowledge and Tools: OREO‑LM, REVEAL, and AVIS

Baobao Algorithm Notes

Oct 23, 2023 · Artificial Intelligence

Why Multimodal AI Agents Could Be the Next Killer App for Large Models

The article recounts a personal test of a multimodal AI agent in Newport Beach and expands into a detailed analysis of current multimodal LLM architectures, memory mechanisms, task planning, tool usage, personality modeling, cost constraints, evaluation challenges, and the broader social and reliability implications of deploying such agents.

AI agentsEvaluationMultimodal

0 likes · 44 min read

Why Multimodal AI Agents Could Be the Next Killer App for Large Models

DataFunSummit

Oct 17, 2023 · Artificial Intelligence

Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration

This article reviews recent research on augmenting language and vision models by incorporating external knowledge sources such as knowledge graphs, multi‑source retrieval, and dynamic tool‑calling frameworks, presenting three systems—OREO‑LM, REVEAL, and AVIS—and their experimental results.

AI researchLanguage ModelMultimodal

0 likes · 27 min read

Enhancing Vision and Language Models with External Knowledge Graphs and Tool Integration

DataFunTalk

Sep 26, 2023 · Artificial Intelligence

MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models

This article presents MiniGPT-4, a multimodal system that combines a frozen visual encoder (Q‑Former + ViT) with an open‑source large language model (Vicuna), describes its motivation, training pipeline, demo capabilities, observed limitations, and includes a brief Q&A session.

AI researchImage CaptioningMiniGPT-4

0 likes · 15 min read

MiniGPT-4: Enhancing Vision‑Language Understanding with Large Language Models

DataFunTalk

Sep 19, 2023 · Artificial Intelligence

Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges

This article reviews the technical background of simultaneous speech translation, compares offline and real‑time scenarios, details ASR and MT technologies, describes the system architecture and design strategies, and discusses the major challenges and solutions for deploying robust, low‑latency translation services.

ASRHuaweiMachine Translation

0 likes · 16 min read

Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges

DaTaobao Tech

Sep 13, 2023 · Artificial Intelligence

Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments

The article surveys how large language models can be integrated into recommendation systems, either as feature extractors or as end‑to‑end recommenders, showing that LLM‑derived semantics improve recall, ranking, diversity, and user experience, and outlining future multimodal, efficiency, and re‑ranking directions.

EmbeddingLLMMultimodal

0 likes · 19 min read

Integrating Large Language Models with Recommendation Systems: Paradigms, Methods, and Experiments

DataFunTalk

Sep 5, 2023 · Artificial Intelligence

Baidu Commercial Multimodal Understanding and AIGC Innovation Practices

This article presents Baidu's commercial multimodal understanding framework and AIGC innovations, detailing rich-media multimodal perception, the VICAN‑12B multimodal representation‑generation model, scenario‑specific fine‑tuning, feature quantization for ranking, and practical applications such as marketing content generation, digital‑human video creation, and poster synthesis.

AIGCBaiduMultimodal

0 likes · 12 min read

Huolala Tech

Jul 21, 2023 · Artificial Intelligence

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

Recent advances in visual language models enable zero-shot multimodal tasks, and this article explores their application to open-set object detection, prompt learning, and promptable surgical instrument segmentation, highlighting methods like CLIP, CoOp, and the DetPro framework with experimental results across multiple benchmarks.

MultimodalSemantic Segmentationcomputer vision

0 likes · 12 min read

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

360 Tech Engineering

Jul 6, 2023 · Artificial Intelligence

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

The CSIG‑hosted "Enterprise Visit – Into Qihoo 360" event on June 29, 2023 gathered over a thousand participants to explore multimodal and cross‑modal learning in the large‑model era, featuring keynote speeches from leading university researchers and Qihoo 360 AI experts, a tour of the company's facilities, and discussions on future AI research directions.

CSIGMultimodalQihoo360

0 likes · 8 min read

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

Tencent Cloud Developer

Jun 28, 2023 · Artificial Intelligence

Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies

Prompt engineering teaches how to craft effective instructions, context, input data, and output formats for large language models, using clear commands, iterative refinement, and advanced methods such as zero‑shot, few‑shot, chain‑of‑thought, Tree of Thoughts, retrieval‑augmented and progressive‑hint prompting to achieve precise, reliable results across diverse tasks.

AIChain-of-ThoughtMultimodal

0 likes · 17 min read

Prompt Engineering: Fundamentals, Techniques, and Advanced Strategies

Efficient Ops

Jun 26, 2023 · Artificial Intelligence

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Amid tightening financial regulations, ICBC's software team proposes a multimodal AI anti‑fraud framework that combines image, video, and structured data to detect deep‑fake, mask, and forged‑document attacks, enriches verification with cross‑modal cues, and outlines future expansion to text and speech modalities.

AIDeep LearningMultimodal

0 likes · 7 min read

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

DataFunSummit

Jun 14, 2023 · Artificial Intelligence

DataFun Summit 2023: Large Language Models and AIGC Conference

DataFun will host the DataFun Summit 2023 on June 17‑18, featuring three chairs and eight presenters who will discuss core topics such as large language model research, multimodal generation, reinforcement learning, tool learning, distributed training, and industry applications, with free registration via QR code.

AI ConferenceAIGCLarge Language Models

0 likes · 42 min read

DataFun Summit 2023: Large Language Models and AIGC Conference

Rare Earth Juejin Tech Community

Jun 12, 2023 · Artificial Intelligence

Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications

This article provides a detailed, step‑by‑step tutorial on OpenAI’s language models, API endpoints, prompt engineering, embeddings, moderation, fine‑tuning, LangChain workflows, memory management, and multimodal capabilities such as audio transcription and image generation, complete with code examples and practical usage tips.

APIEmbeddingLangChain

0 likes · 45 min read

Comprehensive Guide to Using OpenAI APIs: Models, Prompts, Embeddings, Fine‑Tuning, LangChain, and Multimodal Applications

Rare Earth Juejin Tech Community

Jun 11, 2023 · Artificial Intelligence

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

This article provides a detailed technical review of the evolution of GPT models, the Transformer architecture, large language model training methods, emergent abilities such as in‑context learning and chain‑of‑thought, multimodal extensions, and the challenges of data, scaling, and alignment, offering a holistic view for researchers and practitioners.

AIGPTInstructGPT

0 likes · 28 min read

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

NetEase LeiHuo Testing Center

Jun 2, 2023 · Artificial Intelligence

AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models

This article shares the development of a global search platform that leverages AI technologies such as Chinese word segmentation, part‑of‑speech tagging, text similarity via Simhash and Synonyms, image similarity using histogram, Hamming distance and ResNet‑50, and multimodal CLIP‑based models to improve search efficiency and accuracy.

AIMultimodalNLP

0 likes · 12 min read

AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models

Programmer DD

May 5, 2023 · Artificial Intelligence

How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot

Microsoft has fully opened Bing Chat to all users, introducing multimodal responses, a multilingual Image Creator, persistent chat history, and upcoming plugin support, while sharing usage statistics and outlining weekly update plans that position Bing as an AI‑driven search copilot competing with ChatGPT.

AIBingChatMicrosoft

0 likes · 8 min read

How Microsoft’s Bing Chat Upgrade Turns Search into an AI Copilot

DataFunSummit

Apr 20, 2023 · Artificial Intelligence

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

This presentation introduces the Mengzi lightweight model technology stack, covering large‑scale pre‑training, motivations for lightweight models, detailed techniques such as knowledge and sequence‑relation enhancement, training optimization, model compression, retrieval‑augmented pre‑training, multimodal extensions, open‑source releases, and real‑world applications.

Knowledge DistillationLarge Language ModelsMultimodal

0 likes · 23 min read

Mengzi Lightweight Model Technology System and Advances in Small‑Scale and Retrieval‑Augmented Pretraining

21CTO

Apr 2, 2023 · Artificial Intelligence

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

This article reviews Microsoft’s extensive 155‑page work on early experiments with GPT‑4, exploring how the model approaches artificial general intelligence, its testing methodology, multimodal capabilities, programming and mathematical performance, interaction with tools and humans, limitations, societal impact, and future research directions.

AI safetyArtificial General IntelligenceGPT-4

0 likes · 15 min read

Can GPT‑4 Be Considered Early AGI? Insights from Microsoft’s 155‑Page Study

DataFunTalk

Apr 1, 2023 · Artificial Intelligence

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

In a GTC fireside chat, Nvidia CEO Jensen Huang and OpenAI co‑founder Ilya Sutskever discuss GPT‑4's multimodal advances, the evolution of deep learning from early neural networks to large‑scale models, the pivotal role of GPUs and datasets like ImageNet, and their vision for more reliable, scalable artificial intelligence.

Artificial IntelligenceDeep LearningGPT-4

0 likes · 10 min read

Nvidia Meets OpenAI: Highlights from the GTC Fireside Chat on GPT‑4, Deep Learning History, and the Future of AI

Programmer DD

Mar 22, 2023 · Artificial Intelligence

How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive

The article reviews Baidu’s newly launched Ernie Bot, a multimodal large language model, comparing its literary, business, mathematical, Chinese comprehension, and multimodal abilities with GPT‑4, while detailing the underlying technologies, knowledge‑enhancement techniques, and deployment strategy behind the model.

AI comparisonBaiduErnie Bot

0 likes · 10 min read

How Baidu’s Ernie Bot Stacks Up Against GPT‑4: A Deep Dive

Python Programming Learning Circle

Mar 18, 2023 · Artificial Intelligence

Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture

Baidu unveiled its new generative AI chatbot ERNIE Bot, showcasing five practical scenarios, multimodal generation, a detailed technical stack based on the ERNIE and PLATO models, and a comparison with ChatGPT and Bing Chat, while also announcing its invitation‑only testing program and API access for enterprises.

Artificial IntelligenceBaiduChatbot

0 likes · 12 min read

Baidu’s ERNIE Bot (Wenxin Yiyan) Launch: Features, Use Cases, and Technical Architecture

Architecture Digest

Mar 17, 2023 · Artificial Intelligence

Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction

The article reviews Baidu's launch of the multimodal large language model Wenxin Yiyan, compares its literary, business, mathematical, Chinese‑understanding and multimodal abilities with GPT‑4, explains the underlying six‑core technologies and hardware stack, and reports the mixed market and netizen response.

AIBaiduErnie Bot

0 likes · 11 min read

Baidu’s Ernie Bot (Wenxin Yiyan) vs GPT‑4: Capabilities, Technical Foundations, and Market Reaction

DataFunSummit

Mar 15, 2023 · Artificial Intelligence

Key Features and Capabilities of OpenAI's GPT‑4

OpenAI's GPT‑4, a large multimodal language model, expands token limits, adds image understanding, demonstrates strong reasoning on professional exams, supports many languages, and is already integrated into Microsoft Bing, while offering various access options and improved safety compared to its predecessor.

AIGPT-4Microsoft Bing

0 likes · 9 min read

Key Features and Capabilities of OpenAI's GPT‑4

Alimama Tech

Feb 1, 2023 · Artificial Intelligence

CapOnImage: Context-driven Dense Captioning on Images

The paper presents CapOnImage, a novel image‑on‑image captioning task that generates location‑specific decorative text for product images, introduces the 2.1‑million‑image CapOnImage2M dataset, and proposes a mixed‑modality transformer with position‑aware pre‑training and progressive training, achieving superior accuracy and diversity and already deployed in Alibaba’s advertising platforms for measurable business impact.

Context-AwareDeep LearningImage Captioning

0 likes · 9 min read

CapOnImage: Context-driven Dense Captioning on Images

NetEase Cloud Music Tech Team

Jan 4, 2023 · Artificial Intelligence

Relevance Modeling and Ranking for Cloud Music Video Search

The paper details Cloud Music’s video‑search pipeline—query understanding, recall, relevance, ranking and re‑ranking—highlighting challenges such as ambiguous content, timeliness and multi‑objective goals, and describes two deployed models (a twin‑tower aspect relevance network and a click‑graph propagator) that together boost click‑through rate by 1.5 % and effective CTR by 2.3 %.

MultimodalRankingclick graph

0 likes · 24 min read

Relevance Modeling and Ranking for Cloud Music Video Search

DataFunTalk

Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Deep LearningLarge-Scale DataMultimodal

0 likes · 16 min read

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

DataFunTalk

Nov 16, 2022 · Artificial Intelligence

From Neural Search to Multimodal Applications: Building Scalable Services with Jina and DocArray

This article explains how neural search enables multimodal data handling, introduces the DocArray data structures (Document and DocumentArray), and demonstrates how Jina’s cloud‑native framework can be used to build, deploy, and scale end‑to‑end multimodal services such as DocsQA.

AICloud NativeDocArray

0 likes · 16 min read

From Neural Search to Multimodal Applications: Building Scalable Services with Jina and DocArray

DataFunSummit

Oct 20, 2022 · Artificial Intelligence

End-to-End Speech Relation Extraction

This paper presents an end‑to‑end approach for extracting relational triples directly from speech signals, bypassing intermediate transcription, and demonstrates its effectiveness on synthesized speech versions of the CoNLL04 and TACRED datasets, highlighting challenges such as length constraints and cross‑modal alignment.

End-to-EndMultimodalnatural language processing

0 likes · 17 min read

HelloTech

Oct 19, 2022 · Artificial Intelligence

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

The Intelligent Creative System defines advertising creatives across formats, evaluates image and text quality using reference‑based metrics and models like DeepBIQ, generates multimodal ads via GANs and Transformers, and selects optimal variants through bandit‑based CTR prediction and multimodal fusion, enabling scalable, data‑driven creative production.

AIBandit ModelGaN

0 likes · 10 min read

Intelligent Creative System: Types, Quality Evaluation, Generation Models, and Optimization

DataFunTalk

Sep 13, 2022 · Artificial Intelligence

Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

This article presents an in‑depth overview of intelligent question answering in QQ Browser search, covering its background, the core KBQA and DeepQA technologies, system architecture, challenges, recent advances such as end‑to‑end, knowledge‑guided and multimodal QA, and practical Q&A for deployment.

AIDeep LearningMultimodal

0 likes · 22 min read

Intelligent Question Answering in QQ Browser Search: Background, Key Technologies, and Frontier Research

NetEase Cloud Music Tech Team

Aug 17, 2022 · Artificial Intelligence

Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches

The paper describes NetEase Cloud Music’s LOOK live‑streaming recommendation system for the song‑playback page, which combines millisecond‑level real‑time feature pipelines, multi‑target optimization (click, watch, gift, comment) via ESMM+FM and MMoE models, GradNorm‑based loss fusion, and a multimodal avatar‑text‑host ranking model, achieving double‑digit CTR and CTCVR gains while balancing producer and consumer retention.

ESMMGradNormLive Streaming

0 likes · 26 min read

Live Streaming Recommendation Practices in NetEase Cloud Music: Real-time, Multi-target, and Multimodal Approaches

NetEase LeiHuo UX Big Data Technology

Aug 11, 2022 · Artificial Intelligence

Multimodal Models: Research Directions and a Practical Case of Game Frame‑Rate Prediction

This article introduces the concept of modality, outlines the five research branches of multimodal models, and presents a concrete case where multimodal deep‑learning techniques are applied to predict and improve game frame rates using both static and temporal features.

AIMultimodalfeature fusion

0 likes · 9 min read

Multimodal Models: Research Directions and a Practical Case of Game Frame‑Rate Prediction

Alibaba Cloud Big Data AI Platform

Jul 29, 2022 · Artificial Intelligence

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

This article introduces EasyNLP’s newly integrated Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, code snippets, performance benchmarks on multiple datasets, and step‑by‑step tutorials for fine‑tuning and inference using open‑source checkpoints.

AI generationChinese NLPEasyNLP

0 likes · 20 min read

Unlock Chinese Text-to-Image Generation with EasyNLP’s Open‑Source Models

Alibaba Cloud Developer

Jul 28, 2022 · Artificial Intelligence

Unlock Chinese Text‑to‑Image Generation with EasyNLP: Models, Code & Tutorials

This article introduces EasyNLP's Chinese text‑to‑image generation framework, explains the underlying Transformer‑VQGAN architecture, provides model specifications, showcases sample outputs, and offers step‑by‑step code and command‑line instructions for fine‑tuning and inference.

Chinese AIEasyNLPMultimodal

0 likes · 20 min read

Unlock Chinese Text‑to‑Image Generation with EasyNLP: Models, Code & Tutorials

DataFunSummit

Jul 27, 2022 · Artificial Intelligence

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

The DataFun 2022 NLP Summit, held on July 30, brings together top researchers and industry leaders from Alibaba, Baidu, Microsoft, Amazon, and more to present the latest advances in large‑scale pre‑training, multimodal perception, information extraction, dialogue interaction, machine translation, and practical AI deployments, with live streaming and free registration via QR code.

AIDialogue SystemsLarge Language Models

0 likes · 44 min read

DataFun 2022 Natural Language Processing Summit – Leading Experts Discuss Large‑Scale Language Models, Multimodal Understanding, Dialogue Systems and AI Applications

DataFunSummit

Jul 27, 2022 · Artificial Intelligence

Intelligent Creative Advertising: Content Understanding, Generation, and Distribution at JD.com

This article presents JD.com's end‑to‑end intelligent creative system, covering the background of content‑driven e‑commerce, a multi‑stage content understanding pipeline, AI‑powered video, image and copy generation, multimodal creative selection and distribution, and real‑world business impact.

AIAdvertisingMultimodal

0 likes · 27 min read

Intelligent Creative Advertising: Content Understanding, Generation, and Distribution at JD.com

Alimama Tech

Jul 13, 2022 · Artificial Intelligence

Fully Automatic Template‑Free Image‑Text Creative Generation System

Alibaba Alimama’s fully automatic, template‑free image‑text creative generation system uses deep‑learning models across material mining, layout synthesis, on‑image copy generation, and visual attribute rendering to produce personalized ad creatives directly from product images and metadata, achieving roughly 19 % CTR lift over prior template‑based methods.

AIAd CreativeAutomation

0 likes · 19 min read

Fully Automatic Template‑Free Image‑Text Creative Generation System

DataFunTalk

Jul 9, 2022 · Artificial Intelligence

Education Knowledge Graph: Opportunities and Challenges

The article provides a comprehensive overview of education knowledge graphs, explaining their definition, significance, diverse application scenarios such as smart textbooks, deep reading, subject insight, and intelligent services, while also analyzing technical challenges like data heterogeneity, granularity, multimodality, quality control, and proposing future research directions.

Artificial IntelligenceIntelligent TutoringMultimodal

0 likes · 25 min read

Education Knowledge Graph: Opportunities and Challenges

Xiaohongshu Tech REDtech

Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationMultimodalTransformer

0 likes · 7 min read

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

JD Retail Technology

Jun 16, 2022 · Artificial Intelligence

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

The 2022 Global AI Technology Innovation Competition – Algorithm Challenge, co‑hosted by JD Retail and academic partners, brought together 12 finalist teams from over 3,000 entrants to tackle e‑commerce‑focused AI problems such as multimodal image‑text matching and product‑title entity recognition, highlighting real‑world business impact and fostering talent exchange.

AI competitionJD RetailMultimodal

0 likes · 8 min read

2022 Global AI Technology Innovation Competition – Algorithm Challenge: Connecting AI with E‑commerce

AntTech

Jun 15, 2022 · Artificial Intelligence

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

XYLayoutLM introduces a layout‑aware multimodal network that improves visually‑rich document understanding by augmenting XY‑Cut for robust reading order generation and employing a Dilated Conditional Position Encoding to handle variable‑length inputs, achieving state‑of‑the‑art performance on XFUN and FUNSD datasets.

MultimodalVision TransformerXYCut

0 likes · 10 min read

XYLayoutLM: Towards Layout-Aware Multimodal Networks for Visually-Rich Document Understanding

DaTaobao Tech

May 27, 2022 · Artificial Intelligence

Multimodal Pretraining for Search Recall in E-commerce

The paper proposes a multimodal pre‑training framework that jointly encodes query text and item titles with images via shared and single‑stream towers, using MLM, MPM, QIC, and matching tasks, and demonstrates substantial Recall@K gains on a billion‑item e‑commerce catalog by leveraging visual cues to bridge the semantic gap.

MultimodalVector Retrievale-commerce

0 likes · 17 min read

Multimodal Pretraining for Search Recall in E-commerce

DataFunTalk

May 20, 2022 · Artificial Intelligence

Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling

This article presents a multimodal approach that combines dynamic analysis and graph machine learning to generate and apply social relationship graphs in videos, detailing problem background, graph generation modules, applications such as video retrieval, experimental results, and future research directions.

AIGraph Neural NetworkMultimodal

0 likes · 11 min read

Hierarchical Graph Convolutional Networks for Video Social Relationship Modeling

Laiye Technology Team

May 18, 2022 · Artificial Intelligence

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

This article reviews three representative document intelligence models—StrucText, LayoutLMv3, and GraphDoc—detailing their input features, feature fusion strategies, self‑supervised tasks, and underlying architectures, and explains how they learn embeddings for segments, words, or regions to enable classification and key‑value extraction.

Graph Neural NetworksLayout AnalysisMultimodal

0 likes · 15 min read

Overview of Document Intelligence Models: StrucText, LayoutLMv3, and GraphDoc

Bilibili Tech

May 10, 2022 · Artificial Intelligence

Glance Supervised Video Moment Retrieval via the ViGA Framework

The paper presents a glance‑supervised video moment retrieval approach that records a single annotator‑seen frame, introduces the ViGA contrastive learning framework to leverage this weak temporal cue, and demonstrates on three benchmarks performance rivaling fully supervised methods while keeping annotation cost minimal.

Glance SupervisionMultimodalViGA

0 likes · 8 min read

Glance Supervised Video Moment Retrieval via the ViGA Framework

Tencent Tech

Apr 21, 2022 · Artificial Intelligence

How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks

Tencent’s newly unveiled HunYuan AI model achieved a grand‑slam by ranking first on the five most authoritative cross‑modal video retrieval datasets, showcasing a hierarchical multimodal approach that dramatically boosts retrieval precision and promises broad impact for both research and industry applications.

AIMultimodalTencent

0 likes · 5 min read

How Tencent’s HunYuan Model Dominated All Major Video Retrieval Benchmarks

DaTaobao Tech

Apr 6, 2022 · Artificial Intelligence

Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling

The paper tackles Taobao Live’s cold‑start problem for new users by introducing a multi‑channel lifelong product‑sequence network that enriches purchase histories with side information, extracts relevance‑focused subsequences across five channels, and integrates them via target‑attention DIN, achieving substantial offline and online performance gains, especially for low‑activity users.

MultimodalRecommendation SystemsUser Modeling

0 likes · 23 min read

Improving New User Experience in Taobao Live Recommendation via Multi‑Channel Lifelong Product Sequence Modeling

DataFunTalk

Mar 28, 2022 · Artificial Intelligence

Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph

This article introduces Meituan's on‑site comprehensive knowledge graph, detailing its multi‑layer design, data‑driven construction pipeline, challenges of diverse user demands and industry complexity, and showcases practical applications in search, recommendation, intelligent display, as well as future expansion plans.

MeituanMultimodalknowledge graph

0 likes · 22 min read

Construction and Application of Meituan's On‑site Comprehensive Knowledge Graph

Baobao Algorithm Notes

Mar 7, 2022 · Artificial Intelligence

How CLIP Uses Natural Language Supervision for Powerful Zero‑Shot Vision

This article explains CLIP’s multimodal contrastive pre‑training, its simple yet effective architecture, code implementation, and how its zero‑shot capability can surpass supervised ImageNet models by leveraging a 400‑million image‑text dataset and shared semantic embeddings.

AICLIPMultimodal

0 likes · 7 min read

How CLIP Uses Natural Language Supervision for Powerful Zero‑Shot Vision

DataFunTalk

Jan 22, 2022 · Artificial Intelligence

Multimodal Content Understanding Techniques in Search Systems

This talk presents Tencent's multimodal content understanding framework for search, covering hierarchical content features, large‑scale ranking, fine‑grained image semantic vectors, video and document analysis, quality detection, duplicate removal, and future directions in AI‑driven search.

AIImage EmbeddingMultimodal

0 likes · 17 min read

Multimodal Content Understanding Techniques in Search Systems

DataFunTalk

Dec 26, 2021 · Artificial Intelligence

Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges

This talk reviews recent progress in neural‑symbolic learning and multimodal knowledge discovery, highlighting examples such as GPT‑3 reasoning failures, the need for symbolic knowledge, historical developments, various integration methods, challenges in multimodal knowledge graphs, and future research directions.

AIMultimodalNeural-symbolic

0 likes · 20 min read

Neural–Symbolic Learning and Multimodal Knowledge Discovery: Recent Advances, Methods, and Challenges

Alibaba Cloud Developer

Dec 6, 2021 · Artificial Intelligence

Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator

Alibaba’s DAMO Academy and Tsinghua University introduced M6‑UFC, a non‑autoregressive multimodal transformer that unifies arbitrary text and image controls to generate high‑quality, editable fashion designs, dramatically reducing carbon emissions and outperforming GAN‑based models in fidelity and relevance while accelerating production speed.

AIM6-UFCMultimodal

0 likes · 11 min read

Can AI Design Full Clothing Lines? Inside Alibaba’s M6-UFC Generator

DataFunSummit

Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRMultimodalSLU

0 likes · 22 min read

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

AntTech

Oct 29, 2021 · Artificial Intelligence

Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)

The Ant Insurance Technology team, together with the Institute of Automation of the Chinese Academy of Sciences, secured first place in both the MuSe‑Wilder and MuSe‑Sent tracks of the MuSe2021 Multimodal Sentiment Challenge held at the 29th ACM International Conference on Multimedia in Chengdu, showcasing advanced multimodal AI techniques.

BiLSTMDeep LearningMuSe2021

0 likes · 4 min read

Ant Insurance Technology and CASIA Win Two Tracks at MuSe2021 Multimodal Sentiment Challenge (ACM MM 2021)

DataFunTalk

Sep 30, 2021 · Artificial Intelligence

Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team

This article presents Alibaba's AliMe team's year‑long progress on knowledge graph research, covering the fundamentals of knowledge graphs, domain and multimodal graph construction techniques, practical e‑commerce applications such as dialogue‑driven recommendation, virtual‑anchor script generation, and insights on future directions.

AIMultimodaldialogue system

0 likes · 23 min read

Advances in Knowledge Graph Construction and Applications by Alibaba's AliMe Team

DataFunSummit

Sep 26, 2021 · Artificial Intelligence

Contrastive Learning and Its Applications in Weibo Content Representation

This article explains the fundamentals of contrastive learning, reviews typical models such as SimCLR, MoCo, SwAV, BYOL, SimSiam and Barlow Twins, and demonstrates how these methods are applied to Weibo text and multimodal (text‑image) representation tasks like hashtag generation and image‑text matching.

MultimodalNLPWeibo

0 likes · 18 min read

Contrastive Learning and Its Applications in Weibo Content Representation

Meituan Technology Team

Sep 2, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

The paper describes Meituan’s retail product knowledge graph—a multi‑layered, multi‑modal system that structures billions of SKUs, attributes, and user insights using hierarchical categories, graph‑enhanced NER, semi‑supervised learning, and expert‑in‑the‑loop validation, enabling precise search, ranking, recommendation, and real‑time merchant optimization.

AIMultimodalRetail

0 likes · 25 min read

Construction and Application of Retail Product Knowledge Graph at Meituan

DataFunTalk

Aug 30, 2021 · Artificial Intelligence

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

This article explains the concept of contrastive learning, its relationship to self‑supervised and metric learning, describes key system components and loss functions, reviews major image, NLP and multimodal models such as SimCLR, MoCo, SwAV, BYOL, and demonstrates how contrastive learning is applied to Weibo hashtag generation, similar‑post retrieval, and text‑image matching using CD‑TOM and W‑CLIP models.

AIMultimodalWeibo

0 likes · 19 min read

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

Tencent Advertising Technology

Aug 18, 2021 · Artificial Intelligence

2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback

The 2021 Tencent Advertising Algorithm Competition, held as the ACM MM 2021 Grand Challenge, announced the top three teams for two tracks, presented the accepted multimodal video advertising papers with detailed reviewer comments, and highlighted the significance of algorithmic innovation over ranking alone.

ACM MMAIAdvertising

0 likes · 8 min read

2021 Tencent Advertising Algorithm Competition: Winners, Accepted Papers, and Reviewer Feedback

DataFunTalk

Jul 12, 2021 · Artificial Intelligence

Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design

This article presents an in‑depth overview of Tencent Music's live‑streaming recommendation system, covering business background, system architecture, recall and ranking model designs, multi‑modal extensions, and advanced training techniques such as DSSM, ESMM, GradNorm, and CGC to improve user engagement and conversion.

AIDSSMLive Streaming

0 likes · 13 min read

Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design

DataFunTalk

Jul 1, 2021 · Artificial Intelligence

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

This article surveys the evolution of pre‑trained models, covering the origins of transfer and self‑supervised learning, the rise of transformer‑based PTMs such as BERT and GPT, efficient architecture designs, multimodal and multilingual extensions, theoretical analyses, and future research directions for scalable and robust AI systems.

AI researchEfficient TrainingLarge Language Models

0 likes · 27 min read

Pre‑Trained Models: Past, Present, and Future – A Comprehensive Survey

Xianyu Technology

Jul 1, 2021 · Artificial Intelligence

Improving Search Relevance in Xianyu: System Design and Model Implementation

The paper describes Xianyu’s new relevance‑matching pipeline—integrating basic, text‑matching, semantic (BERT‑based dual‑tower), multimodal, and click‑graph features and fusing them with a GBDT model—which boosts search DCG@10 by 6.5 %, query satisfaction by 24 % and click interaction by over 20 % while outlining future enhancements for finer attribute matching and richer structured data.

MultimodalRankinge-commerce

0 likes · 13 min read

Improving Search Relevance in Xianyu: System Design and Model Implementation

Tencent Advertising Technology

May 28, 2021 · Artificial Intelligence

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies

The article shares a Tencent competition champion’s practical TensorFlow‑based video ad solution, detailing data handling, model architecture, optimization tricks, multimodal fusion techniques, and experimental observations to help participants improve performance in the 2021 Tencent Advertising Algorithm Contest.

MultimodalTensorFlowadvertising algorithm

0 likes · 7 min read

Insights from the Tencent Advertising Algorithm Competition: Model Framework and Optimization Strategies