Tag

Multimodal Models

1 views collected around this technical thread.

AntTech
AntTech
Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Computer VisionMultimodal Models
0 likes · 20 min read
21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyMultimodal Models
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
DevOps
DevOps
May 6, 2025 · Artificial Intelligence

PPTAgent: An Open‑Source AI System for Automated Presentation Generation Using a Two‑Stage Editing Approach

PPTAgent, an open‑source AI tool jointly developed by the Chinese Academy of Sciences and Shanghai Jiexin Technology, automatically creates high‑quality PowerPoint slides by analyzing reference decks, extracting layout patterns, and iteratively editing content with a self‑correction mechanism, achieving superior content, design, and coherence scores compared to existing methods.

AIMultimodal ModelsPPTAgent
0 likes · 6 min read
PPTAgent: An Open‑Source AI System for Automated Presentation Generation Using a Two‑Stage Editing Approach
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance
0 likes · 16 min read
How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray
IT Services Circle
IT Services Circle
Jun 9, 2024 · Artificial Intelligence

Plagiarism Allegations Between Stanford's Llama3‑V and China's MiniCPM‑Llama3‑V 2.5 Model

The article details the controversy surrounding Stanford's Llama3‑V team admitting to copying the architecture and code of the Chinese MiniCPM‑Llama3‑V 2.5 model, presents new evidence of weight similarity, compares performance metrics, and discusses broader concerns about the recognition of Chinese AI research in the open‑source community.

AI ethicsLlama3-VMiniCPM
0 likes · 9 min read
Plagiarism Allegations Between Stanford's Llama3‑V and China's MiniCPM‑Llama3‑V 2.5 Model
Tencent Tech
Tencent Tech
Oct 20, 2023 · Artificial Intelligence

Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins

At ICDAR 2023, Tencent's OCR team leveraged self‑developed algorithms and large‑model backbones to clinch four official championship titles across the DSText and SVRD tracks, showcasing breakthroughs in dense video text detection, tracking, end‑to‑end recognition, and structured information extraction.

ICDAR 2023Multimodal ModelsOCR
0 likes · 14 min read
Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins
DataFunSummit
DataFunSummit
Jun 23, 2023 · Artificial Intelligence

Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications

This article introduces video action recognition, covering its basic definition, downstream tasks, major algorithmic families—including CNN‑based, Vision‑Transformer, self‑supervised, and multimodal approaches—and discusses practical deployment scenarios and open challenges in the field.

CNNMultimodal ModelsVision Transformer
0 likes · 16 min read
Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications
360 Tech Engineering
360 Tech Engineering
May 6, 2023 · Artificial Intelligence

Open‑Vocabulary Object Detection: Overview of OVR‑CNN, RegionCLIP, and CORA

This article reviews the evolution of open‑vocabulary object detection, describing the OVR‑CNN paradigm, the RegionCLIP enhancements, and the CORA model with region prompting and anchor pre‑matching, and discusses their impact on future multimodal AI systems.

CORAClipMultimodal Models
0 likes · 14 min read
Open‑Vocabulary Object Detection: Overview of OVR‑CNN, RegionCLIP, and CORA
Kuaishou Large Model
Kuaishou Large Model
Mar 31, 2023 · Artificial Intelligence

How Kuaishou Elevates Video Quality and AI Performance at NVIDIA GTC 2023

At NVIDIA GTC 2023, Kuaishou engineers unveiled cutting‑edge solutions ranging from video quality assessment and enhancement, 3D digital‑human live streaming, a custom TensorRT‑based performance framework, large‑scale recommendation model acceleration, to multimodal massive‑model deployment for short‑video scenarios.

AI optimizationDigital HumanMultimodal Models
0 likes · 9 min read
How Kuaishou Elevates Video Quality and AI Performance at NVIDIA GTC 2023
JD Retail Technology
JD Retail Technology
Dec 12, 2022 · Artificial Intelligence

Keynote Presentations from the 2022 Global AI Technology Conference – First Industrial Vision Frontier Forum

The 2022 Global AI Technology Conference’s First Industrial Vision Frontier Forum in Hangzhou gathered leading experts to discuss advances in industrial AI visual defect detection, multimodal pre‑training models, smart meteorology, digital intelligence in retail, third‑generation compound semiconductor detection, meta‑imaging, and broader industrial AI applications, highlighting the future of intelligent manufacturing.

AIIndustrial VisionMeta Imaging
0 likes · 12 min read
Keynote Presentations from the 2022 Global AI Technology Conference – First Industrial Vision Frontier Forum
DataFunTalk
DataFunTalk
Nov 23, 2022 · Artificial Intelligence

Lightweight Adaptation Techniques for Multimodal Large Models

This article presents a comprehensive overview of lightweight adaptation methods—including language, domain, and optimization‑goal adapters and structured prompts—to overcome language mismatch, low domain fit, and objective differences when deploying open‑source multimodal large models in real‑world AI applications.

AIMultimodal Modelsadapter
0 likes · 14 min read
Lightweight Adaptation Techniques for Multimodal Large Models