Tagged articles

multimodal models

44 articles · Page 1 of 1

Jun 29, 2026 · Artificial Intelligence

Why AI Assistants Shouldn't Just Wait for Questions: Insights from Tsinghua’s EgoIntrospect and IPIBench

The article reviews two recent Tsinghua studies—EgoIntrospect and IPIBench—that shift AI assistants from passive Q&A toward real‑time, user‑centric understanding and proactive interaction, detailing new egocentric datasets, benchmark tasks, and an IPI‑Agent framework for timely, context‑aware assistance in wearable and embodied devices.

AI assistantsbenchmarkegocentric dataset

0 likes · 9 min read

Why AI Assistants Shouldn't Just Wait for Questions: Insights from Tsinghua’s EgoIntrospect and IPIBench

CodeTrend

Jun 12, 2026 · Artificial Intelligence

Vision Banana: Turning Image Generation Models into Generalist Vision Learners

Vision Banana shows that large‑scale image‑generation models can be instruction‑tuned to perform zero‑shot visual‑understanding tasks such as semantic segmentation, instance segmentation, depth and normal estimation, achieving or surpassing specialist SOTA results while preserving their original generative capabilities.

Instruction TuningRGB encodingVision Banana

0 likes · 32 min read

Vision Banana: Turning Image Generation Models into Generalist Vision Learners

Machine Heart

Jun 11, 2026 · Artificial Intelligence

Audio Reasoning for AGI: First Comprehensive Survey of Multimodal Large Models and Four Frontier Paths

This survey examines the emerging field of audio reasoning, distinguishing it from simple audio perception, and systematically classifies four major research directions—Audio-to-Text, Audio-to-Speech, Audio-Visual, and Agentic Audio—while highlighting challenges in data, evaluation, and real‑time multimodal integration.

AGIAudio ReasoningAudio-Visual

0 likes · 10 min read

Audio Reasoning for AGI: First Comprehensive Survey of Multimodal Large Models and Four Frontier Paths

Top Architect

Jun 7, 2026 · Artificial Intelligence

Can Gemini Omni Turn Sketches into Blockbuster Videos with a Single Prompt?

Google unveiled Gemini Omni at I/O, a multimodal world model that combines reasoning and generation to produce realistic videos, edit them conversationally, create digital avatars, and demonstrate emergent abilities like style transfer and scene continuation, while also introducing safety measures such as forced watermarks.

AI emergenceAvatar FlowGemini Omni

0 likes · 10 min read

Can Gemini Omni Turn Sketches into Blockbuster Videos with a Single Prompt?

Huolala Tech

Jun 3, 2026 · Artificial Intelligence

Three Breakthroughs Driving the Rapid Rise of Computer Vision

The article reviews three major recent breakthroughs in computer vision—self‑supervised visual foundation models, feed‑forward 3D reconstruction, and unified multimodal models—detailing their underlying methods, key papers, performance characteristics, and practical implications for real‑world AI applications.

3D reconstructioncomputer visionmultimodal models

0 likes · 22 min read

Three Breakthroughs Driving the Rapid Rise of Computer Vision

Machine Heart

Apr 30, 2026 · Artificial Intelligence

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

DeepSeek has released a multimodal model built on a visual‑primitive reasoning paradigm that treats coordinates and bounding boxes as reasoning units, dramatically compresses visual tokens, and achieves state‑of‑the‑art performance on counting, spatial, and topological tasks, while exposing current limits of multimodal inference.

AI reasoningCompressed Sparse AttentionDeepSeek

0 likes · 12 min read

How DeepSeek’s Visual‑Primitive Paradigm Redefines Multimodal Reasoning

Machine Heart

Apr 24, 2026 · Artificial Intelligence

Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap

DeepMind’s Vision Banana model demonstrates that large‑scale image‑generation pre‑training can produce powerful, universal visual representations, achieving state‑of‑the‑art results on segmentation, depth, and normal estimation without task‑specific heads, thereby supporting the hypothesis that generation and understanding are fundamentally linked.

DeepMindGenerative AIVision Banana

0 likes · 13 min read

Vision Banana Shows That Image Generation Equals Understanding – DeepMind’s GPT‑like Leap

Architect's Must-Have

Apr 20, 2026 · Industry Insights

What the Top 10 Open‑Source AI Projects Reveal About the Future of AI Agents

This roundup analyzes ten rapidly rising open‑source AI projects—covering self‑evolving agents, multimodal models, edge deployment, and quantum AI—highlighting their technical innovations, benchmark results, and emerging industry trends that are reshaping AI development and deployment.

AI AgentsQuantum AIedge AI

0 likes · 22 min read

What the Top 10 Open‑Source AI Projects Reveal About the Future of AI Agents

PaperAgent

Apr 14, 2026 · Artificial Intelligence

Can Neural Computers Replace Traditional CPUs? Inside the Latest AI Harness Designs

This article analyzes the emerging concept of Neural Computers, explains how Harness engineering unifies compute, memory, and I/O into a single learned runtime, reviews recent multimodal models from Anthropic, Meta, and OpenAI, and presents detailed experimental results from the NCCLIGen and NCGUIWorld prototypes.

Neural computerharness designmultimodal models

0 likes · 8 min read

Can Neural Computers Replace Traditional CPUs? Inside the Latest AI Harness Designs

DataFunTalk

Apr 7, 2026 · Artificial Intelligence

How a Champion Quantized a 150 GB Multimodal Model in Just 4 Hours

In a four‑hour competition, algorithm engineer Zhang Zhen from a Chinese EV company detailed his end‑to‑end workflow for quantizing the massive Qwen3‑Next‑80B model, covering sensitive‑layer analysis, iterative smoothing, fallback strategies, and parallel "horse‑race" debugging that led his team to win the GeekDay challenge.

Iterative SmoothModel Quantizationlarge language models

0 likes · 9 min read

How a Champion Quantized a 150 GB Multimodal Model in Just 4 Hours

Machine Heart

Apr 3, 2026 · Artificial Intelligence

Physion-Eval Reveals Why Visually Realistic AI Videos Still Miss Physical Reality

Physion-Eval, a new benchmark with nearly 11,000 expert‑annotated video clips, shows that most current AI‑generated videos look realistic but frequently violate basic physics, and that even top multimodal models fail to reliably detect these physical errors.

AI video generationMLLM criticbenchmark

0 likes · 8 min read

Physion-Eval Reveals Why Visually Realistic AI Videos Still Miss Physical Reality

Machine Learning Algorithms & Natural Language Processing

Mar 31, 2026 · Artificial Intelligence

Unified Multimodal Modeling: How LongCat-Next Bridges Understanding and Generation

The article analyzes why text models naturally combine understanding and generation, explains the fundamental conflicts that prevent images from sharing the same tokenization, and details LongCat-Next’s discrete autoregressive approach—using SAE visual encoders, residual vector quantization, and a unified LLM backbone—to achieve a single model that can both comprehend and create multimodal content.

LongCat-NextRVQTokenization

0 likes · 21 min read

Unified Multimodal Modeling: How LongCat-Next Bridges Understanding and Generation

Woodpecker Software Testing

Mar 15, 2026 · Industry Insights

Five Major AI Testing Tool Trends Shaping 2026

A 2026 study of 137 leading tech firms reveals that AI is deeply embedded across the software testing lifecycle, replacing manual exploration with intent‑understanding, autonomous verification, and causal attribution, and outlines five concrete trends—from native AI test engines to edge‑cloud collaborative architectures and AI‑on‑AI trust verification.

AI TrustAI testingedge-cloud testing

0 likes · 9 min read

Five Major AI Testing Tool Trends Shaping 2026

AIWalker

Mar 13, 2026 · Artificial Intelligence

Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model

ArtiMuse, a new image aesthetic model unveiled at CVPR 2026 by Shanghai AI Lab and the China Academy of Art, combines a massive 10K fine‑grained dataset, a Token‑As‑Score scoring scheme, and unified textual‑and‑numeric feedback to deliver culturally aware, expert‑level art analysis and robust quantitative ratings.

AI aestheticsToken-As-Scoreart analysis

0 likes · 7 min read

Towards AI That Truly Understands Art: Introducing the ArtiMuse Aesthetic Understanding Model

AIWalker

Mar 5, 2026 · Artificial Intelligence

How ViDA-UGC Leverages Large Multimodal Models for Fine-Grained Visual Quality Assessment

The article introduces ViDA-UGC, a large‑scale UGC visual‑quality dataset and its companion benchmark ViDA‑Bench, explains the MILP‑driven sampling, expert annotation pipeline, and CoT‑based evaluation framework, and shows how fine‑tuning popular multimodal LLMs on this data markedly improves low‑level quality perception, grounding, and description capabilities.

Chain-of-Thoughtbenchmarkdataset

0 likes · 12 min read

How ViDA-UGC Leverages Large Multimodal Models for Fine-Grained Visual Quality Assessment

AI Frontier Lectures

Feb 6, 2026 · Artificial Intelligence

Can Merging Text‑Only and Grounded Visual Reasoning Unlock Better Vision‑Language Models?

The paper introduces Mixture‑of‑Visual‑Thoughts (MoVT), a context‑adaptive reasoning paradigm that integrates pure‑text and visually‑grounded inference modes within a single model, and presents the two‑stage AdaVaR training framework with a novel AdaGRPO reinforcement‑learning algorithm to automatically select the optimal mode for each visual‑language task, achieving consistent gains across eight benchmarks and surpassing strong baselines including GPT‑4o.

AdaVaRMixture-of-Visual-ThoughtsVisual Reasoning

0 likes · 16 min read

Can Merging Text‑Only and Grounded Visual Reasoning Unlock Better Vision‑Language Models?

Baidu Geek Talk

Feb 2, 2026 · Artificial Intelligence

How Cloud AI Infra Powers the Next Wave of Embodied Intelligence

This article outlines the rapid rise of embodied intelligence, the explosion of Vision‑Language‑Action (VLA) research, and how cloud‑based AI infrastructure—including multi‑level IaaS, data pipelines, dual‑system model designs, and reinforcement‑learning workflows—addresses emerging scaling and deployment challenges.

VLAmultimodal modelsreinforcement learning

0 likes · 13 min read

How Cloud AI Infra Powers the Next Wave of Embodied Intelligence

PaperAgent

Dec 10, 2025 · Artificial Intelligence

How AI Agents Like UFO, Mobile-Agent, and UI-TARS Are Shaping 2025 Smartphones

The article examines the underlying GUI‑Agent technologies behind the 2025 “Doubao” smartphone, comparing Microsoft’s UFO series, Alibaba’s Mobile‑Agent v2/v3, and ByteDance’s UI‑TARS, detailing their model foundations, input modalities, action spaces, planning mechanisms, learning strategies, open‑source status, and multi‑agent frameworks.

AI AgentsGUI automationcomparative analysis

0 likes · 8 min read

How AI Agents Like UFO, Mobile-Agent, and UI-TARS Are Shaping 2025 Smartphones

Tencent Advertising Technology

Dec 4, 2025 · Artificial Intelligence

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

The paper introduces POPEN, a new framework that uses preference‑based optimization and ensemble methods to reduce hallucinations and improve segmentation accuracy in large visual language models, achieving state‑of‑the‑art results on multiple benchmarks.

LVLMPreference OptimizationSegmentation

0 likes · 14 min read

How POPEN Boosts LVLM Reasoning Segmentation with Preference Optimization and Ensemble

Tencent Technical Engineering

Sep 6, 2025 · Artificial Intelligence

ARC Lab’s Blueprint: Turning Multimodal AI Research into Real-World Impact

The article outlines ARC Lab’s evolution from its 2019 founding as an internal corporate research unit to a high‑impact AI team that pursues difficult multimodal understanding and generation problems, measures success through a technology‑impact funnel, publishes 30‑40 top‑tier papers annually, and translates research into open‑source tools and products that drive academic, industry, business, and societal value.

AI researchcorporate researchmultimodal models

0 likes · 19 min read

ARC Lab’s Blueprint: Turning Multimodal AI Research into Real-World Impact

ZhongAn Tech Team

Aug 11, 2025 · Artificial Intelligence

What’s New in AI? GPT‑5, SWE‑Swiss, Agentic Web, and More This Week

This week’s tech roundup highlights major AI breakthroughs—including OpenAI’s GPT‑5 launch, the SWE‑Swiss code‑fixing model from Peking University and ByteDance, Pinduoduo’s AI talent hiring surge, the emerging Agentic Web paradigm, Google’s Genie 3 world model, multimodal railway design AI, DJI’s first robot vacuum, AI‑enhanced smart glasses, and a new humanoid robot perception system—all reflecting rapid advances across generative, multimodal, and applied AI.

AIAI hiringAgentic Web

0 likes · 20 min read

What’s New in AI? GPT‑5, SWE‑Swiss, Agentic Web, and More This Week

AI Frontier Lectures

Jul 27, 2025 · Information Security

Can Hidden Activations Expose Multimodal Model Jailbreaks?

The paper reveals that large multimodal language models retain refusal signals in their hidden states even after jailbreak attempts, and proposes a training‑free detection method that leverages these signals to identify unsafe inputs across text and image modalities with strong generalization.

AI safetyLVLM securityhidden activation analysis

0 likes · 7 min read

Can Hidden Activations Expose Multimodal Model Jailbreaks?

DataFunTalk

Jul 22, 2025 · Artificial Intelligence

The Billion‑Dollar Talent War: How a Chinese AI Prodigy Jumped from OpenAI to Meta

Yu Jiahui, a Chinese AI prodigy, rose from a teenage computer‑science prodigy at USTC to a key figure behind Google’s Conformer, OpenAI’s GPT‑4o, and now Meta’s multimodal Llama‑4 effort, illustrating the high‑stakes talent battles reshaping the future of artificial intelligence.

AI talentMetaOpenAI

0 likes · 17 min read

The Billion‑Dollar Talent War: How a Chinese AI Prodigy Jumped from OpenAI to Meta

AIWalker

Jun 30, 2025 · Artificial Intelligence

Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves

FilMaster is a pioneering AI system that learns cinematic principles from a 440,000‑shot movie database, combines multimodal LLMs, RAG, and audience‑centric rhythm control to generate editable, high‑quality films, and outperforms prior methods by over 50% on the new FilmEval benchmark.

AI film generationFilmEval benchmarkRetrieval-Augmented Generation

0 likes · 18 min read

Chinese Team Builds First AI That Understands Film, Using 440K Shot Library for Director‑Level Camera Moves

21CTO

Jun 19, 2025 · Artificial Intelligence

How ByteDance’s Seedance 1.0 Outperforms Google’s Veo 3 in AI Video Generation

ByteDance’s newly released Seedance 1.0, a bilingual text‑to‑video and image‑to‑video model, surpasses Google’s Veo 3 in visual consistency, motion realism, and inference speed, achieving top rankings on multiple benchmarks while requiring significantly less compute time per 1080p clip.

AI video generationbenchmark comparisoninference speed

0 likes · 7 min read

How ByteDance’s Seedance 1.0 Outperforms Google’s Veo 3 in AI Video Generation

AntTech

Jun 15, 2025 · Artificial Intelligence

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

The Interactive Intelligence Lab of Ant Technology Research Institute presented 21 accepted CVPR 2025 papers covering visual generation, editing, 3D vision, digital humans and multimodal AI, highlighting tools such as MagicQuill, Lumos, Aurora, FLARE, LeviTor, MangaNinja, AniDoc, Mimir, AvatarArtist, DiffListener, MotionStone, TensorialGaussianAvatars, DualTalk, CompreCap and Uni-AD.

CVPR2025Generative AIcomputer vision

0 likes · 20 min read

21 Ant Research Papers Shaping CVPR 2025: AI Image & Video Generation Breakthroughs

AntTech

May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI

0 likes · 16 min read

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

DevOps

May 6, 2025 · Artificial Intelligence

PPTAgent: An Open‑Source AI System for Automated Presentation Generation Using a Two‑Stage Editing Approach

PPTAgent, an open‑source AI tool jointly developed by the Chinese Academy of Sciences and Shanghai Jiexin Technology, automatically creates high‑quality PowerPoint slides by analyzing reference decks, extracting layout patterns, and iteratively editing content with a self‑correction mechanism, achieving superior content, design, and coherence scores compared to existing methods.

AIPPTAgentmultimodal models

0 likes · 6 min read

PPTAgent: An Open‑Source AI System for Automated Presentation Generation Using a Two‑Stage Editing Approach

Baidu MEUX

Apr 28, 2025 · Artificial Intelligence

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

This article surveys the latest AI breakthroughs, covering ChatGPT‑4o's native image generation, Runway's Gen‑4 video model, Midjourney V7, AnimeGamer's infinite anime simulation, JiMeng 3.0 poster creator, ComfyUI‑Copilot workflow assistant, DomoAI's voice‑image digital humans, Ready AI web builder, DeepSeek‑V3, and Alibaba's ultra‑realistic 3D digital human model.

AIdigital humansimage generation

0 likes · 8 min read

Top 10 AI Model Breakthroughs of 2024: From ChatGPT‑4o to 3D Digital Humans

Alipay Experience Technology

Apr 25, 2025 · Artificial Intelligence

Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic

This article introduces EchoMimic V1 and V2, open‑source generative digital‑human systems that turn a single voice clip and a portrait photo into synchronized talking avatars, covering their technical background, architecture, training strategies, performance comparisons, and potential application scenarios.

Generative AIdigital avatarmultimodal models

0 likes · 13 min read

Creating Lifelike Talking Avatars from Voice and Photo with EchoMimic

JavaScript

Mar 20, 2025 · Artificial Intelligence

How MiniMax’s Linear‑Attention Architecture Is Redefining Long‑Context AI Models

MiniMax’s rapid 2025 releases—including a video model, open‑source LLM, and high‑fidelity voice model—showcase its multimodal linear‑attention architecture that handles up to 4 million tokens, earns a16z recognition, and signals China’s growing influence in open‑source AI innovation.

Linear Attentionartificial-intelligencelarge language models

0 likes · 8 min read

How MiniMax’s Linear‑Attention Architecture Is Redefining Long‑Context AI Models

Alibaba Cloud Big Data AI Platform

Mar 7, 2025 · Artificial Intelligence

How Pai‑Megatron‑Patch Boosts Qwen2‑VL Multimodal Training Efficiency

This article explains how the Pai‑Megatron‑Patch toolkit enhances the usability and training performance of the Qwen2‑VL multimodal large model by introducing model‑parallel weight conversion, user‑friendly data loading, visual feature processing optimizations, optimizer offloading, and pipeline parallelism techniques, supported by extensive experimental analysis.

MegatronQwen2-VLlarge language models

0 likes · 25 min read

How Pai‑Megatron‑Patch Boosts Qwen2‑VL Multimodal Training Efficiency

Software Engineering 3.0 Era

Feb 26, 2025 · Artificial Intelligence

2024 AI Testing Landscape: Emerging Technologies, Tools, and Real-World Cases

The article reviews how large language models and multimodal AI are reshaping software testing in 2024, detailing advances in unit‑test generation, fuzzing, oracle creation, agent‑based frameworks, and a curated list of new AI‑powered testing tools together with future trends and challenges.

AI testingLLMfuzz testing

0 likes · 15 min read

2024 AI Testing Landscape: Emerging Technologies, Tools, and Real-World Cases

Rare Earth Juejin Tech Community

Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance

0 likes · 16 min read

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

HyperAI Super Neural

Nov 20, 2024 · Artificial Intelligence

From Computer Vision to Medical AI: Prof. Xie's Work Hits Nature, NeurIPS, CVPR

Professor Xie's team at Shanghai Jiao Tong University reports rapid progress in AI for Science, detailing multimodal medical AI models, large open datasets, language and vision‑language models, and knowledge‑enhanced representations that outperform existing baselines across multiple benchmarks.

Knowledge GraphsOpen Datasetslarge language models

0 likes · 14 min read

From Computer Vision to Medical AI: Prof. Xie's Work Hits Nature, NeurIPS, CVPR

IT Services Circle

Jun 9, 2024 · Artificial Intelligence

Plagiarism Allegations Between Stanford's Llama3‑V and China's MiniCPM‑Llama3‑V 2.5 Model

The article details the controversy surrounding Stanford's Llama3‑V team admitting to copying the architecture and code of the Chinese MiniCPM‑Llama3‑V 2.5 model, presents new evidence of weight similarity, compares performance metrics, and discusses broader concerns about the recognition of Chinese AI research in the open‑source community.

AI ethicsLlama3-VMiniCPM

0 likes · 9 min read

Plagiarism Allegations Between Stanford's Llama3‑V and China's MiniCPM‑Llama3‑V 2.5 Model

Tencent Tech

Oct 20, 2023 · Artificial Intelligence

Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins

At ICDAR 2023, Tencent's OCR team leveraged self‑developed algorithms and large‑model backbones to clinch four official championship titles across the DSText and SVRD tracks, showcasing breakthroughs in dense video text detection, tracking, end‑to‑end recognition, and structured information extraction.

ICDAR 2023OCRStructured Information Extraction

0 likes · 14 min read

Tencent OCR's AI Triumph at ICDAR 2023: Four Championship Wins

DataFunSummit

Jun 23, 2023 · Artificial Intelligence

Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications

This article introduces video action recognition, covering its basic definition, downstream tasks, major algorithmic families—including CNN‑based, Vision‑Transformer, self‑supervised, and multimodal approaches—and discusses practical deployment scenarios and open challenges in the field.

CNNVision Transformermultimodal models

0 likes · 16 min read

Frontiers of Video Action Recognition: Concepts, Algorithms, and Applications

360 Tech Engineering

May 6, 2023 · Artificial Intelligence

Open‑Vocabulary Object Detection: Overview of OVR‑CNN, RegionCLIP, and CORA

This article reviews the evolution of open‑vocabulary object detection, describing the OVR‑CNN paradigm, the RegionCLIP enhancements, and the CORA model with region prompting and anchor pre‑matching, and discusses their impact on future multimodal AI systems.

CLIPCORAOVR-CNN

0 likes · 14 min read

Open‑Vocabulary Object Detection: Overview of OVR‑CNN, RegionCLIP, and CORA

Kuaishou Tech

Apr 23, 2023 · Artificial Intelligence

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

The article details how Kuaishou’s multimodal AI research, including its K7 trillion‑parameter model and VLUA algorithm, partners with Renmin University’s Gaoling AI Institute to launch a joint lab, produce cutting‑edge papers such as WebBrain and ChatImg, and advance recommendation and search technologies across the short‑video ecosystem.

AIIndustry collaborationRecommendation Systems

0 likes · 17 min read

Kuaishou & Renmin AI Institute: Driving Multimodal Large Model Innovation

Kuaishou Large Model

Mar 31, 2023 · Artificial Intelligence

How Kuaishou Elevates Video Quality and AI Performance at NVIDIA GTC 2023

At NVIDIA GTC 2023, Kuaishou engineers unveiled cutting‑edge solutions ranging from video quality assessment and enhancement, 3D digital‑human live streaming, a custom TensorRT‑based performance framework, large‑scale recommendation model acceleration, to multimodal massive‑model deployment for short‑video scenarios.

Recommendation SystemsTensorRTai-optimization

0 likes · 9 min read

How Kuaishou Elevates Video Quality and AI Performance at NVIDIA GTC 2023

JD Retail Technology

Dec 12, 2022 · Artificial Intelligence

Keynote Presentations from the 2022 Global AI Technology Conference – First Industrial Vision Frontier Forum

The 2022 Global AI Technology Conference’s First Industrial Vision Frontier Forum in Hangzhou gathered leading experts to discuss advances in industrial AI visual defect detection, multimodal pre‑training models, smart meteorology, digital intelligence in retail, third‑generation compound semiconductor detection, meta‑imaging, and broader industrial AI applications, highlighting the future of intelligent manufacturing.

AIIndustrial VisionMeta Imaging

0 likes · 12 min read

Keynote Presentations from the 2022 Global AI Technology Conference – First Industrial Vision Frontier Forum

DataFunTalk

Nov 23, 2022 · Artificial Intelligence

Lightweight Adaptation Techniques for Multimodal Large Models

This article presents a comprehensive overview of lightweight adaptation methods—including language, domain, and optimization‑goal adapters and structured prompts—to overcome language mismatch, low domain fit, and objective differences when deploying open‑source multimodal large models in real‑world AI applications.

AIAdapterDomain Adaptation

0 likes · 14 min read

Lightweight Adaptation Techniques for Multimodal Large Models

Zuoyebang Tech Team

Aug 12, 2022 · Artificial Intelligence

How End-to-End Speech Recognition is Transforming AI Voice Applications

The AISummit AI conference highlighted advances in intelligent voice, with experts from ZuoYeBang, ByteDance, Microsoft and others discussing end‑to‑end speech recognition, pronunciation correction, and high‑quality speech synthesis, and exploring how multimodal pre‑trained models will shape the future of voice AI.

AI Conferenceend-to-end AIintelligent voice

0 likes · 6 min read

How End-to-End Speech Recognition is Transforming AI Voice Applications