Tagged articles

cross-modal

6 articles · Page 1 of 1

Jun 14, 2026 · Artificial Intelligence

Why Do Text‑Image & Video Agents Lose Key Info? Three‑Step Cross‑Modal Alignment

The article explains why multimodal agents often drop essential details during text‑to‑image or video generation, then presents a three‑step protocol—semantic anchor extraction, manual validation checklist, and breakpoint compensation routing—that cuts rework cycles from 4.7 to 1.2, reduces alignment time by 70%, and lowers key‑info loss by 95% while raising one‑pass success to 85%.

Multimodal AIWorkflow Automationagent alignment

0 likes · 6 min read

Why Do Text‑Image & Video Agents Lose Key Info? Three‑Step Cross‑Modal Alignment

Machine Learning Algorithms & Natural Language Processing

Mar 9, 2026 · Artificial Intelligence

Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass

The article analyzes the quadratic attention and KV‑Cache bottlenecks of Transformers on ultra‑long inputs and the heavy compute cost of traditional supervised fine‑tuning, then presents Sakana AI's Cost Amortization framework—Doc‑to‑LoRA and Text‑to‑LoRA—that shifts weight updates to a meta‑training hypernetwork, achieving sub‑50 MB memory for 128K‑token inference, sub‑GB update memory for long‑document QA, and zero‑shot task adaptation with sub‑second latency.

Cost AmortizationLoRALong-context

0 likes · 13 min read

Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass

DataFunSummit

Jan 20, 2024 · Artificial Intelligence

Cross‑Modal Video Open‑Tag Mining: Techniques, Methods, and Applications

The article presents a comprehensive overview of cross‑modal video open‑tag mining, detailing its technical background, related multimodal research methods, a four‑stage open‑tag solution from 360 AI Research Institute, and future application prospects such as unsupervised tag coverage, semantic retrieval, and content moderation.

Multimodal AIcross-modallabel extraction

0 likes · 15 min read

Cross‑Modal Video Open‑Tag Mining: Techniques, Methods, and Applications

360 Tech Engineering

Jul 6, 2023 · Artificial Intelligence

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

The CSIG‑hosted "Enterprise Visit – Into Qihoo 360" event on June 29, 2023 gathered over a thousand participants to explore multimodal and cross‑modal learning in the large‑model era, featuring keynote speeches from leading university researchers and Qihoo 360 AI experts, a tour of the company's facilities, and discussions on future AI research directions.

CSIGMultimodalQihoo360

0 likes · 8 min read

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

DataFunTalk

Sep 24, 2022 · Artificial Intelligence

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

This article introduces the importance of image‑text cross‑modal representation, presents the Chinese Zero dataset with two pre‑training subsets and five downstream tasks, describes the R2D2 dual‑tower‑plus‑single‑tower pre‑training framework with multiple loss functions, and reports extensive experiments and real‑world deployment insights.

Multimodal AIR2D2 frameworkZero dataset

0 likes · 19 min read

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

DataFunTalk

Jun 16, 2022 · Artificial Intelligence

BigBang Transformer (BBT): A 1‑Billion‑Parameter Financial Pre‑trained Language Model with Time‑Series‑Text Cross‑Modal Architecture

The BigBang Transformer (BBT) is a 1‑billion‑parameter financial pre‑trained language model that combines text and time‑series data in a cross‑modal Transformer architecture, achieving up to 10% higher downstream accuracy than T5‑scale models and demonstrating strong performance on financial NLP tasks, time‑series forecasting, and multi‑factor investment strategies.

Artificial IntelligenceFinancial NLPTime Series Forecasting

0 likes · 19 min read

BigBang Transformer (BBT): A 1‑Billion‑Parameter Financial Pre‑trained Language Model with Time‑Series‑Text Cross‑Modal Architecture