Tag

Multimodal Generation

0 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
Dec 24, 2024 · Artificial Intelligence

AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark

AniSora combines a 10‑million‑pair anime text‑video dataset, a controllable diffusion‑transformer with temporal‑mask conditioning for text‑to‑video, interpolation and region‑guided animation, and a 948‑video benchmark, delivering industry‑leading character and motion consistency and already powering low‑cost dynamic‑comic production for multiple IPs.

AI AnimationAnime Video GenerationDataset Benchmark
0 likes · 21 min read
AniSora: An Integrated System for Anime Video Generation with Data Flywheel, Controllable Diffusion Models, and Evaluation Benchmark
AntTech
AntTech
Nov 27, 2024 · Artificial Intelligence

EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework

EchoMimicV2, an open‑source project from Ant Group's Alipay AI team, introduces an end‑to‑end audio‑driven framework that generates high‑quality semi‑body portrait videos by jointly coordinating audio, pose, and image inputs, while addressing challenges of condition complexity, model stability, and computational cost.

AI researchDigital HumanMultimodal Generation
0 likes · 16 min read
EchoMimicV2: An End-to-End Audio‑Driven Semi‑Body Human Animation Framework
Tencent Cloud Developer
Tencent Cloud Developer
May 15, 2024 · Artificial Intelligence

Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters

Tencent has open‑sourced its upgraded 1.5‑billion‑parameter HunYuan DiT model—the first Chinese‑native, bilingual (Chinese‑English) text‑to‑image diffusion‑with‑transformer system—delivering about 20% visual quality improvement, multi‑round generation, video‑generation potential, and free commercial use, with full weights, inference code, and algorithms available on Hugging Face and GitHub for developers and enterprises.

Chinese-native AIDiT architectureDiffusion Transformer
0 likes · 6 min read
Tencent Open-Sources HunYuan DiT: First Chinese-Native Text-to-Image Model with 1.5B Parameters
DataFunTalk
DataFunTalk
Jan 31, 2024 · Artificial Intelligence

Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)

The article reviews the rapid development of large language models in enterprise settings, covering internal collaboration tools, AI assistants for development and marketing, multimodal generation, inference speed bottlenecks, resource constraints, and future directions such as open‑source models and academic‑industry cooperation.

AI assistantsAI in marketingMultimodal Generation
0 likes · 8 min read
Industry Trends and Challenges of Large Language Models in Enterprise Applications (2023 Review)
Ximalaya Technology Team
Ximalaya Technology Team
Oct 10, 2023 · Artificial Intelligence

MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis

MiniGPT-5 is a novel multimodal generation model using generative vokens to interleave text and image synthesis, integrating Stable Diffusion and LLMs with a two-stage training that requires no domain-specific annotations, achieving state‑of‑the‑art coherence and quality on benchmarks like CC3M, VIST, and MMDialog.

AI researchMultimodal GenerationStable Diffusion
0 likes · 9 min read
MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis
DataFunTalk
DataFunTalk
Mar 4, 2023 · Artificial Intelligence

Advances in AIGC: AliceMind Text Generation Models and Multimodal mPLUG from Alibaba DAMO Academy

This article reviews recent AIGC progress, introducing the AliceMind series of text generation models—including PALM, PLUG, and a Chinese GPT‑3—alongside the multimodal mPLUG architecture, and discusses their training strategies, performance results, and practical deployment insights.

AIGCAliceMindMultimodal Generation
0 likes · 16 min read
Advances in AIGC: AliceMind Text Generation Models and Multimodal mPLUG from Alibaba DAMO Academy