Tagged articles

cross-modal

6 articles · Page 1 of 1
Smart Workplace Lab
Smart Workplace Lab
Jun 14, 2026 · Artificial Intelligence

Why Do Text‑Image & Video Agents Lose Key Info? Three‑Step Cross‑Modal Alignment

The article explains why multimodal agents often drop essential details during text‑to‑image or video generation, then presents a three‑step protocol—semantic anchor extraction, manual validation checklist, and breakpoint compensation routing—that cuts rework cycles from 4.7 to 1.2, reduces alignment time by 70%, and lowers key‑info loss by 95% while raising one‑pass success to 85%.

Multimodal AIWorkflow Automationagent alignment
0 likes · 6 min read
Why Do Text‑Image & Video Agents Lose Key Info? Three‑Step Cross‑Modal Alignment
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 9, 2026 · Artificial Intelligence

Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass

The article analyzes the quadratic attention and KV‑Cache bottlenecks of Transformers on ultra‑long inputs and the heavy compute cost of traditional supervised fine‑tuning, then presents Sakana AI's Cost Amortization framework—Doc‑to‑LoRA and Text‑to‑LoRA—that shifts weight updates to a meta‑training hypernetwork, achieving sub‑50 MB memory for 128K‑token inference, sub‑GB update memory for long‑document QA, and zero‑shot task adaptation with sub‑second latency.

Cost AmortizationLoRALong-context
0 likes · 13 min read
Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass
DataFunSummit
DataFunSummit
Jan 20, 2024 · Artificial Intelligence

Cross‑Modal Video Open‑Tag Mining: Techniques, Methods, and Applications

The article presents a comprehensive overview of cross‑modal video open‑tag mining, detailing its technical background, related multimodal research methods, a four‑stage open‑tag solution from 360 AI Research Institute, and future application prospects such as unsupervised tag coverage, semantic retrieval, and content moderation.

Multimodal AIcross-modallabel extraction
0 likes · 15 min read
Cross‑Modal Video Open‑Tag Mining: Techniques, Methods, and Applications
360 Tech Engineering
360 Tech Engineering
Jul 6, 2023 · Artificial Intelligence

CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models

The CSIG‑hosted "Enterprise Visit – Into Qihoo 360" event on June 29, 2023 gathered over a thousand participants to explore multimodal and cross‑modal learning in the large‑model era, featuring keynote speeches from leading university researchers and Qihoo 360 AI experts, a tour of the company's facilities, and discussions on future AI research directions.

CSIGMultimodalQihoo360
0 likes · 8 min read
CSIG Enterprise Visit to Qihoo 360: Multimodal and Cross‑Modal Learning in the Era of Large Models
DataFunTalk
DataFunTalk
Sep 24, 2022 · Artificial Intelligence

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

This article introduces the importance of image‑text cross‑modal representation, presents the Chinese Zero dataset with two pre‑training subsets and five downstream tasks, describes the R2D2 dual‑tower‑plus‑single‑tower pre‑training framework with multiple loss functions, and reports extensive experiments and real‑world deployment insights.

Multimodal AIR2D2 frameworkZero dataset
0 likes · 19 min read
Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework
DataFunTalk
DataFunTalk
Jun 16, 2022 · Artificial Intelligence

BigBang Transformer (BBT): A 1‑Billion‑Parameter Financial Pre‑trained Language Model with Time‑Series‑Text Cross‑Modal Architecture

The BigBang Transformer (BBT) is a 1‑billion‑parameter financial pre‑trained language model that combines text and time‑series data in a cross‑modal Transformer architecture, achieving up to 10% higher downstream accuracy than T5‑scale models and demonstrating strong performance on financial NLP tasks, time‑series forecasting, and multi‑factor investment strategies.

Artificial IntelligenceFinancial NLPTime Series Forecasting
0 likes · 19 min read
BigBang Transformer (BBT): A 1‑Billion‑Parameter Financial Pre‑trained Language Model with Time‑Series‑Text Cross‑Modal Architecture