Tagged articles
3 articles
Page 1 of 1
Alipay Experience Technology
Alipay Experience Technology
Sep 30, 2025 · Artificial Intelligence

How UI-UG Unifies UI Understanding and Generation with a 7B Multimodal Model

The open‑source UI‑UG‑7B multimodal model from Alipay combines UI understanding and generation in a single framework, delivering state‑of‑the‑art performance across referring, grounding, captioning, and code generation tasks while dramatically speeding up UI creation for developers.

UI GenerationUI Understandingartificial intelligence
0 likes · 12 min read
How UI-UG Unifies UI Understanding and Generation with a 7B Multimodal Model
DevOps
DevOps
Feb 17, 2025 · Artificial Intelligence

Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding

Microsoft's OmniParser V2.0 transforms large language models such as DeepSeek‑R1, GPT‑4o, and Qwen‑2.5VL into visual AI agents by accurately detecting interactive UI elements, providing semantic descriptions, and generating structured representations that boost inference speed, reduce latency by 60%, and dramatically improve benchmark accuracy.

AI AgentComputer VisionDeepSeek
0 likes · 7 min read
Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding
21CTO
21CTO
May 21, 2024 · Artificial Intelligence

How Google’s ScreenAI Could Redefine UI Understanding and UX Design

Google’s new ScreenAI visual‑language model, built on the PaLI architecture, can interpret user interfaces and infographics, answer UI‑related questions, generate summaries and navigate screens, and sets new benchmarks that may reshape future user‑experience research and applications.

Google AIMultimodal AIScreenAI
0 likes · 9 min read
How Google’s ScreenAI Could Redefine UI Understanding and UX Design