Tagged articles
77 articles
Page 1 of 1
Machine Heart
Machine Heart
May 13, 2026 · Artificial Intelligence

Super‑Charging MiniCPM‑V 4.6 on One RTX 4090: 1B‑Parameter Multimodal Model Sets New Efficiency Bar

MiniCPM‑V 4.6, a 1.3 B‑parameter multimodal LLM, outperforms larger rivals such as Qwen3.5‑0.8B and Gemma 4 on both accuracy and speed, thanks to early ViT token compression and 4×/16× visual token reduction, delivering sub‑100 ms latency and over 2.6 k token/s throughput on a single RTX 4090 while also running offline on mobile devices.

MiniCPM-VRTX 4090Token Compression
0 likes · 16 min read
Super‑Charging MiniCPM‑V 4.6 on One RTX 4090: 1B‑Parameter Multimodal Model Sets New Efficiency Bar
AI Explorer
AI Explorer
May 1, 2026 · Artificial Intelligence

How a 400B Model on iPhone Redefines the Phone as Your AI “Digital Passport”

Running a 400‑billion‑parameter model locally on the iPhone demonstrates a leap in model compression and edge AI, turning the device into a cognitive agent that handles tasks without apps, while Apple’s upcoming iOS 27 visual‑intelligence features and hardware upgrades cement its role as the core AI ‘digital passport’.

400B modelAI agentsedge AI
0 likes · 6 min read
How a 400B Model on iPhone Redefines the Phone as Your AI “Digital Passport”
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Apr 23, 2026 · Artificial Intelligence

From Transparent Forwarding to Space AI: Analyzing Satellite‑borne AI Base Stations

The article examines the limitations of traditional transparent‑forwarding satellite links, proposes a dual‑engine "communication + AI" architecture for satellite‑borne AI base stations, explores resource‑pooling, space‑app‑store micro‑services, and real‑world use cases in wildfire detection, maritime navigation and renewable‑energy grid management, and outlines the path toward 6G‑enabled space‑ground computing networks.

6GAI base stationSpace Computing
0 likes · 23 min read
From Transparent Forwarding to Space AI: Analyzing Satellite‑borne AI Base Stations
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Apr 14, 2026 · Artificial Intelligence

Two‑Year‑Old Chinese Forecast Gains Global Consensus as Meta, METR and Others Confirm the Same AI Scaling Law

A Chinese research team’s 2024 "density law"—which predicts that the parameters needed for a given LLM performance halve every 3.5 months—has been independently validated by Meta’s scaling ladder, METR’s time‑horizon report, and subsequent analyses, revealing a unified exponential growth curve that reshapes expectations for inference cost, edge AI feasibility, and optimal model‑development strategies.

AI scalingLLM density lawMETR
0 likes · 11 min read
Two‑Year‑Old Chinese Forecast Gains Global Consensus as Meta, METR and Others Confirm the Same AI Scaling Law
James' Growth Diary
James' Growth Diary
Apr 13, 2026 · Frontend Development

Local Inference & Edge AI: Why Front‑End AI Is the Next Battlefield

Edge AI runs AI models directly in browsers or devices, offering zero latency, zero API cost, and full privacy, and the article explains the three technical breakthroughs that make it possible, compares WebLLM, Transformers.js and Ollama, and provides a hybrid architecture with concrete engineering challenges and solutions that can cut total AI costs by 40‑55% for typical front‑end applications.

OllamaTransformers.jsWebGPU
0 likes · 20 min read
Local Inference & Edge AI: Why Front‑End AI Is the Next Battlefield
Geek Labs
Geek Labs
Apr 11, 2026 · Mobile Development

How Google AI Edge Enables True On‑Device LLMs for Android

Google AI Edge introduces two open‑source projects—Gallery and LiteRT‑LM—that let Android developers run large language models locally without network connectivity, offering offline inference, privacy protection, GPU/NPU acceleration, and streaming output for real‑time AI experiences.

AndroidGalleryLLM
0 likes · 9 min read
How Google AI Edge Enables True On‑Device LLMs for Android
AI Explorer
AI Explorer
Apr 10, 2026 · Artificial Intelligence

Google AI Edge Gallery: Offline Mobile AI Model Playground

Google’s open‑source AI Edge Gallery lets Android and iOS devices run large language models such as Gemma 4 entirely offline, eliminating network latency and privacy concerns; the app showcases six modular AI features, offers a simple install path, and signals Google’s push toward a standardized edge‑AI ecosystem.

Gemma 4Google AI Edge GalleryKotlin
0 likes · 8 min read
Google AI Edge Gallery: Offline Mobile AI Model Playground
SuanNi
SuanNi
Apr 3, 2026 · Artificial Intelligence

How Gemma 4 Packs Cloud‑Grade AI Into Your Pocket Devices

Google’s newly released Gemma 4 series delivers a range of open‑source LLMs—from 2.3 B to 31 B parameters—optimized for edge devices through per‑layer embeddings, mixed‑expert MoE, hybrid attention, and extensive hardware support, achieving top‑tier benchmark scores while running efficiently on phones and IoT.

Gemma 4benchmarkedge AI
0 likes · 10 min read
How Gemma 4 Packs Cloud‑Grade AI Into Your Pocket Devices
Lao Guo's Learning Space
Lao Guo's Learning Space
Mar 31, 2026 · Artificial Intelligence

March 2026 AI Frontier: Open‑Source Model 2.0, Agent Explosion, and the Three‑Giant Showdown

The March 2026 AI landscape features a 2.0 era of open‑source large models led by DeepSeek‑R1, a breakout year for AI Agents with hierarchical planning and robust tool calls, and a cost‑driven showdown among GPT‑5.4, Claude Opus 4.6 and Gemini 3.1 Pro, reshaping capabilities, pricing, and deployment strategies across cloud and edge.

AI MarketAI agentsAI models
0 likes · 10 min read
March 2026 AI Frontier: Open‑Source Model 2.0, Agent Explosion, and the Three‑Giant Showdown
AI Architecture Path
AI Architecture Path
Mar 30, 2026 · Artificial Intelligence

Can Wi‑Fi Turn Your ESP32 Into a Camera‑Free Human Pose Detector?

RuView, an open‑source edge‑AI project built on a CMU Wi‑Fi DensePose paper, claims to achieve wall‑penetrating human pose estimation, vital sign monitoring, and ultra‑fast presence detection using only standard Wi‑Fi signals and low‑cost ESP32 hardware, while sparking intense community debate over its claimed capabilities and reproducibility.

AI perceptionESP32RuView
0 likes · 8 min read
Can Wi‑Fi Turn Your ESP32 Into a Camera‑Free Human Pose Detector?
AIWalker
AIWalker
Mar 23, 2026 · Artificial Intelligence

Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26

By introducing a dynamic mixture‑of‑experts routing scheme and an end‑to‑end architecture that eliminates NMS and DFL, YOLO‑Master and YOLO26 dramatically cut compute waste and latency on edge devices, achieving up to 43% faster CPU inference while keeping model accuracy, with all code openly released.

Computer VisionMixture of ExpertsModel Optimization
0 likes · 7 min read
Dynamic Dense Computing and Minimal End‑to‑End Design: YOLO-Master & YOLO26
AI Engineering
AI Engineering
Mar 3, 2026 · Artificial Intelligence

Alibaba Qwen‑3.5 Small Models: 0.8B Parameters Enable Video on Edge Devices

Alibaba released four Qwen‑3.5 models (0.8B‑9B) that use a Gated DeltaNet hybrid‑attention architecture and native multimodal training to achieve 262k‑token contexts, outperform larger rivals on visual, reasoning, and math benchmarks, and run video analysis on phones and laptops, though they still demand significant VRAM.

Gated DeltaNetMultimodal AIbenchmark
0 likes · 6 min read
Alibaba Qwen‑3.5 Small Models: 0.8B Parameters Enable Video on Edge Devices
Weekly Large Model Application
Weekly Large Model Application
Feb 27, 2026 · Industry Insights

Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways

The article explains how the Taalas HC1 edge‑AI chip, with 17,000 tokens/s inference speed, 90 % lower power and 1/20 the cost of Nvidia H200 GPUs, proves that dedicated, non‑general‑purpose silicon can overcome latency, privacy and expense barriers, making on‑device large‑model deployment essential in 2026 and offering a strategic roadmap for Chinese chip makers.

AI chipsChinaCost reduction
0 likes · 12 min read
Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways
AI Engineering
AI Engineering
Feb 9, 2026 · Artificial Intelligence

Three Unconventional Ways to Run OpenClaw on Edge Devices

The article showcases three low‑cost edge deployments of OpenClaw—a $25 Moto phone using Termux, a $5 ESP32‑S3 board running a pure‑C MimiClaw, and the BotDrop Android app that turns an old phone into an AI agent host—detailing setup steps, challenges, and security considerations.

AI agentsAndroidESP32
0 likes · 10 min read
Three Unconventional Ways to Run OpenClaw on Edge Devices
Old Meng AI Explorer
Old Meng AI Explorer
Jan 23, 2026 · Artificial Intelligence

How a 4B‑Parameter AgentCPM‑Explore Beats 30B Models in Long‑Range Tasks

AgentCPM‑Explore, a 4‑billion‑parameter open‑source agent model, breaks the conventional belief that larger models always perform better by achieving state‑of‑the‑art results on eight long‑duration benchmarks, surpassing many 8B and even some 30B models while enabling efficient edge deployment.

Performance Evaluationagent modelsai
0 likes · 12 min read
How a 4B‑Parameter AgentCPM‑Explore Beats 30B Models in Long‑Range Tasks
AI Frontier Lectures
AI Frontier Lectures
Jan 15, 2026 · Artificial Intelligence

What Makes YOLO26 the Next Leap in Edge AI Object Detection?

YOLO26, the latest Ultralytics release, introduces a unified model family with five sizes, removes distribution focal loss, offers end‑to‑end inference without NMS, adds progressive loss balancing and the MuSGD optimizer, and delivers up to 43% faster CPU performance, making it ideal for edge and real‑world vision applications.

Model OptimizationYOLO26edge AI
0 likes · 12 min read
What Makes YOLO26 the Next Leap in Edge AI Object Detection?
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Dec 31, 2025 · Artificial Intelligence

Why AI Inference Is Slow and How Cutting‑Edge Tech Boosts It in Industrial Settings

The article analyzes the severe inference bottlenecks of large language models, CNNs, and recommendation systems and presents a suite of research‑driven accelerations—including token‑level pipeline parallelism (HPipe), KV‑cache clustering (ClusterAttn), quantization (QoKV), heterogeneous edge frameworks (DeepZoning, PICO), delay‑aware edge‑cloud scheduling (DECC), and operator choreography (RACE)—validated on real‑world industrial workloads.

AI inferenceLarge Language ModelsRecommendation Systems
0 likes · 16 min read
Why AI Inference Is Slow and How Cutting‑Edge Tech Boosts It in Industrial Settings
PaperAgent
PaperAgent
Dec 4, 2025 · Artificial Intelligence

Mistral 3 Unveiled: How Its New Open‑Source Models Redefine Performance and Cost

Mistral AI’s latest open‑source release, Mistral 3, introduces three compact dense models and the powerful Mistral Large 3 MoE model, outperforming domestic rivals in benchmarks, offering strong multilingual and multimodal capabilities, and delivering the lowest cost‑performance ratio among open‑source LLMs.

Mistral 3Mixture of ExpertsModel Benchmark
0 likes · 4 min read
Mistral 3 Unveiled: How Its New Open‑Source Models Redefine Performance and Cost
Data Party THU
Data Party THU
Nov 21, 2025 · Artificial Intelligence

Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends

This article analyzes the technical foundations, development frameworks, real‑time inference optimizations, typical industry deployments, and future research directions of multi‑agent systems in 2025, highlighting protocols like FIPA‑ACL and MCP, tools such as LangGraph and ADP3.0, and edge‑computing breakthroughs.

AI ArchitectureModel Quantizationdistributed computing
0 likes · 16 min read
Unlocking 2025 Multi-Agent AI: Core Tech, Frameworks, and Emerging Trends
Sohu Tech Products
Sohu Tech Products
Nov 5, 2025 · Artificial Intelligence

How nndeploy Simplifies the Last Mile of On-Device AI Deployment

nndeploy is an open‑source, high‑performance on‑device AI deployment framework that abstracts the repetitive “last‑mile” workflow into a visual drag‑and‑drop DAG, offering multi‑platform inference, optimization, and ready‑to‑use model configs, enabling developers to go from prototype to production in minutes.

AI deploymentedge AInndeploy
0 likes · 15 min read
How nndeploy Simplifies the Last Mile of On-Device AI Deployment
DataFunTalk
DataFunTalk
Nov 3, 2025 · Artificial Intelligence

What’s Next for AIoT? Key Insights from the 2026 China AIoT Industry Conference

The 2026 China AIoT Industry Annual Conference gathered over 200 experts and 600,000 online viewers to unveil AIoT 2.0 trends, from mobile IoT breakthroughs and edge AI solutions to global connectivity, Web3 integration, and award‑winning innovations shaping the future of intelligent connectivity.

5GAIoTArtificial Intelligence
0 likes · 13 min read
What’s Next for AIoT? Key Insights from the 2026 China AIoT Industry Conference
Architects' Tech Alliance
Architects' Tech Alliance
Oct 29, 2025 · Artificial Intelligence

Why China’s AI Chip Industry Is Poised for a Breakthrough – Trends, Challenges, and Future Outlook

This comprehensive analysis examines the strategic importance, technical challenges, innovation pathways, and market landscape of domestic AI chips in China, highlighting key players, regional clusters, core applications such as intelligent computing, autonomous driving, and robotics, and projecting future industry bottlenecks and opportunities.

AI chipsChina semiconductorFP8
0 likes · 18 min read
Why China’s AI Chip Industry Is Poised for a Breakthrough – Trends, Challenges, and Future Outlook
Baidu Tech Salon
Baidu Tech Salon
Oct 16, 2025 · Artificial Intelligence

How Baidu’s Large‑Model Security Guard Won Vivo’s Top Security Partner Award

At the 2025 Vivo Developer Conference, Baidu Security earned the Best Security Technology Partner award for its edge‑focused large‑model security solution, which tackles multi‑layered threats on devices through comprehensive content protection, tailored edge defenses, advanced attack detection, and a rigorous evaluation framework.

AI securityBaidu Securitycontent moderation
0 likes · 5 min read
How Baidu’s Large‑Model Security Guard Won Vivo’s Top Security Partner Award
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 13, 2025 · Artificial Intelligence

How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices

This article explains the principles, key methods, and practical effects of model quantization, pruning, and knowledge distillation, comparing their advantages and disadvantages, and showing how combining these techniques enables compact, high‑performance AI models on resource‑constrained devices.

Model PruningModel Quantizationedge AI
0 likes · 7 min read
How Quantization, Pruning, and Distillation Shrink AI Models for Edge Devices
Architects' Tech Alliance
Architects' Tech Alliance
Sep 28, 2025 · Artificial Intelligence

How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies

The article examines how the rapid growth of AI models and workloads is reshaping network design, highlighting the need for ultra‑high bandwidth, sub‑millisecond latency, reliability, scalable topologies like Fat‑Tree and Dragonfly, and robust security and QoS mechanisms across data‑center, cloud, and edge environments.

AI networkingDistributed TrainingHigh Bandwidth
0 likes · 11 min read
How AI Workloads Are Redefining Network Architecture: Key Requirements and Topologies
Data Party THU
Data Party THU
Aug 18, 2025 · Artificial Intelligence

Why Google’s Gemma 3 270M Model Is a Game‑Changer for Edge AI

Google’s newly released Gemma 3 270M is a compact 270‑million‑parameter language model that combines a large token vocabulary, energy‑efficient INT4 quantization, strong instruction‑following, and production‑ready checkpoints, making it ideal for fine‑tuning, on‑device deployment, and a wide range of low‑latency AI tasks.

Gemma 3Google AILanguage Model
0 likes · 7 min read
Why Google’s Gemma 3 270M Model Is a Game‑Changer for Edge AI
DevOps
DevOps
Aug 16, 2025 · Artificial Intelligence

Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model

Google has released the open‑source Gemma 3 270M model—a compact, 270‑million‑parameter AI that runs on as little as 2 GB RAM, supports over 140 languages, handles images, and offers strong instruction‑following performance, making it ideal for edge devices and custom fine‑tuning.

Gemma 3Google AIModel Optimization
0 likes · 5 min read
Google Unveils Gemma 3 270M: A Tiny, High‑Efficiency Open‑Source AI Model
AntTech
AntTech
Jul 28, 2025 · Information Security

Securing AI Agents on Devices by 2025: Key Findings from the New Report

The newly released “Terminal Agent Security 2025” report, unveiled at the World AI Conference, systematically categorizes AI agent risks, outlines detection and defense methods, and proposes three protection pathways—single‑agent safety, trustworthy multi‑agent interconnection, and AI‑terminal security—to guide the emerging ecosystem of intelligent edge devices.

2025agent trustedge AI
0 likes · 6 min read
Securing AI Agents on Devices by 2025: Key Findings from the New Report
Data Thinking Notes
Data Thinking Notes
Jul 6, 2025 · Artificial Intelligence

How Quantization Shrinks Giant AI Models for Edge Devices

This article explains why quantizing massive AI models is essential for deploying them on resource‑constrained devices, outlines core quantization concepts, techniques, and methods, compares their pros and cons, and presents practical application scenarios such as smartphones, autonomous driving, IoT, and edge computing.

AI deploymentLarge Language ModelsModel Quantization
0 likes · 9 min read
How Quantization Shrinks Giant AI Models for Edge Devices
Architect
Architect
May 31, 2025 · Artificial Intelligence

Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment

The article details how edge intelligence is applied to the Vivo official app to improve product recommendation on the smart‑hardware floor by abstracting the problem, designing feature engineering pipelines, training TensorFlow models, converting them to TFLite, and deploying inference on mobile devices, while also covering monitoring and performance considerations.

Model DeploymentTensorFlow Liteedge AI
0 likes · 19 min read
Edge Intelligence Implementation in the Vivo Official App: Architecture, Feature Engineering, and Model Deployment
Amap Tech
Amap Tech
May 27, 2025 · Artificial Intelligence

Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment

This article explains how Gaode Map leverages lightweight edge TTS models, dual‑autoregressive large‑model data augmentation, and a configurable audio‑processing DAG to enable users to create highly realistic personalized voice packs from just three recorded sentences.

Gaode MapsTTSdata augmentation
0 likes · 8 min read
Gaode Map Custom Voice Pack: End‑to‑End TTS Model Architecture and Deployment
vivo Internet Technology
vivo Internet Technology
May 21, 2025 · Artificial Intelligence

How Vivo’s App Leverages Edge AI to Personalize Product Recommendations

This article details how Vivo’s official app implements edge intelligence to dynamically rank and recommend hardware products on its homepage, covering problem abstraction, data collection, feature engineering, model design, TensorFlow‑Lite conversion, on‑device inference, and monitoring for a personalized user experience.

AndroidModel DeploymentTensorFlow Lite
0 likes · 19 min read
How Vivo’s App Leverages Edge AI to Personalize Product Recommendations
AIWalker
AIWalker
May 14, 2025 · Artificial Intelligence

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.

Computer VisionLightweight ModelsYOLO
0 likes · 25 min read
How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2025 · Artificial Intelligence

How Cambricon’s AI Chip Roadmap Shapes the Future of Intelligent Computing

This article provides an in‑depth technical analysis of Cambricon’s AI chip portfolio—including terminal, cloud, and edge processors—detailing their micro‑architectures, key innovations such as chiplet technology and memory optimisation, roadmap plans, and real‑world applications in data centers, surveillance and autonomous driving.

AI chipsCambriconChiplet technology
0 likes · 14 min read
How Cambricon’s AI Chip Roadmap Shapes the Future of Intelligent Computing
Architects' Tech Alliance
Architects' Tech Alliance
Mar 25, 2025 · Industry Insights

How Near‑Memory Computing Can Power Edge LLMs: A 2025 Storage Framework

The article analyzes the challenges of deploying large language models on cloud servers—such as latency, security, and constant connectivity—and explains how near‑memory computing architectures (PNM, PIM, CIM) can integrate storage and processing to enable efficient, high‑performance edge AI deployments, outlining the trade‑offs of each approach.

Artificial IntelligenceLarge Language ModelsNear-Memory Computing
0 likes · 5 min read
How Near‑Memory Computing Can Power Edge LLMs: A 2025 Storage Framework
DeWu Technology
DeWu Technology
Feb 12, 2025 · Artificial Intelligence

Edge Intelligence for Intelligent Video Cover Recommendation

The article describes an edge‑based video‑cover recommendation system for DeWu that leverages the MNN SDK and a lightweight MobileNetV3 model, performing on‑device inference with quantization and parallel processing to automatically select high‑quality covers, achieving sub‑second latency and boosting click‑through rates by up to 18 %.

Inference OptimizationModel DeploymentVideo Cover
0 likes · 12 min read
Edge Intelligence for Intelligent Video Cover Recommendation
DevOps
DevOps
Feb 5, 2025 · Artificial Intelligence

Top 10 AI Development Trends for 2025

The article outlines ten major AI trends expected in 2025, including the rise of intelligent agents, fierce competition among multimodal large models, breakthroughs in text‑to‑video generation, long‑term memory, quantum computing, edge models, embodied intelligence, humanoid robots, AI self‑looping with synthetic data, and the enduring scaling laws of large models.

2025 trendsEmbodied IntelligenceQuantum Computing
0 likes · 6 min read
Top 10 AI Development Trends for 2025
AI Large Model Application Practice
AI Large Model Application Practice
Nov 28, 2024 · Artificial Intelligence

Can Tiny Multimodal Models Power Edge AI? Meet OmniVision-968M

This article explores how compact multimodal models like OmniVision-968M enable efficient generative AI on edge devices, detailing their architectural advantages, benchmark superiority over larger models, and step‑by‑step instructions for local installation and visual inference using NexaSDK.

AI inferenceOmniVision-968MTutorial
0 likes · 9 min read
Can Tiny Multimodal Models Power Edge AI? Meet OmniVision-968M
Architects' Tech Alliance
Architects' Tech Alliance
Nov 26, 2024 · Artificial Intelligence

Get Ready for a Shakeout in Edge NPUs

The article examines the rapid growth and increasing complexity of edge AI NPUs, discussing challenges in software and hardware acceleration, supply‑chain constraints, and the need for integrated engine solutions to sustain performance and power efficiency.

NPUSupply Chainedge AI
0 likes · 9 min read
Get Ready for a Shakeout in Edge NPUs
21CTO
21CTO
May 21, 2024 · Artificial Intelligence

How Google’s Edge AI Makes On‑Device Large Language Models a Reality

Google I/O highlighted the rise of on‑device AI, showing how new neural processors, Edge TPU, and tools like the Edge AI SDK and TensorFlow Lite enable developers to run large language models locally, reducing latency, cost, and privacy concerns while integrating with cloud resources.

Google I/OMobile AITensorFlow Lite
0 likes · 9 min read
How Google’s Edge AI Makes On‑Device Large Language Models a Reality
DataFunTalk
DataFunTalk
May 20, 2024 · Artificial Intelligence

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

AIGCMultimodalOPPO
0 likes · 18 min read
Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
JD Retail Technology
JD Retail Technology
Feb 28, 2024 · Artificial Intelligence

Edge AI at JD Retail: Architecture, Challenges, and Business Practices

This article details JD Retail's edge AI (on‑device intelligence) platform, covering its definition, performance and security challenges, three‑layer cloud‑edge‑device architecture, key components such as high‑performance inference engine, data pipeline, Python VM container, and real‑world applications in traffic distribution and image recognition.

AI ArchitectureJD Retailedge AI
0 likes · 15 min read
Edge AI at JD Retail: Architecture, Challenges, and Business Practices
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jan 29, 2024 · Artificial Intelligence

Can Vision Transformers Revolutionize Edge AI Video Analysis?

This article examines the rapid rise of edge AI video analytics, explains how Vision Transformers (ViT) overcome the limitations of traditional CNNs, details a technical pre‑research and POC conducted by a Chinese AI firm, evaluates several open‑source large models, and concludes that the OFA model best meets current edge deployment needs.

OFAedge AIvideo analytics
0 likes · 14 min read
Can Vision Transformers Revolutionize Edge AI Video Analysis?
DaTaobao Tech
DaTaobao Tech
Jun 12, 2023 · Artificial Intelligence

Real-Time Video Stream Subject Recognition for E-commerce (MetaSight)

MetaSight introduces a real‑time video‑stream subject recognition system that replaces the traditional capture‑upload‑search flow with continuous, on‑camera product identification, using a sub‑10 MB edge model, global IDs for frame continuity, edge‑cloud collaboration, and batch processing to cut interaction steps, lower server load, and pave the way for future AR/XR shopping experiences.

ARXRedge AI
0 likes · 10 min read
Real-Time Video Stream Subject Recognition for E-commerce (MetaSight)
Alipay Experience Technology
Alipay Experience Technology
May 10, 2023 · Mobile Development

How Alipay’s Homepage Leverages Edge AI for Smarter Refreshes

This article explains how Alipay’s homepage team collaborates with the edge‑intelligence team to use real‑time client‑side behavior data and algorithm platforms, transforming refresh strategies across time, space, and event dimensions, improving recommendation efficiency, reducing duplication, and delivering measurable performance gains.

Mobileedge AIfrontend
0 likes · 15 min read
How Alipay’s Homepage Leverages Edge AI for Smarter Refreshes
Alipay Experience Technology
Alipay Experience Technology
Nov 28, 2022 · Artificial Intelligence

Why Edge Intelligence Is Shaping the Future of Mobile Apps

This article explains the concept of edge intelligence, its advantages over cloud‑based AI, the technical challenges of deploying AI on mobile devices, Ant Group's development timeline, core technology stack, and future directions for edge‑cloud collaboration.

AI OptimizationMobile AIReal-time inference
0 likes · 10 min read
Why Edge Intelligence Is Shaping the Future of Mobile Apps
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Sep 19, 2022 · Industry Insights

How Edge AI Is Transforming Vertical Industries: Challenges, Technologies, and Real‑World Cases

This article examines the rapid growth of edge AI, outlines its current development, identifies technical and deployment challenges, presents key innovations such as RepRetinaFace and DeepStream, and showcases a 5G‑enabled smart construction solution with concrete performance data and implementation details.

5GDeepStreamVertical Industry
0 likes · 15 min read
How Edge AI Is Transforming Vertical Industries: Challenges, Technologies, and Real‑World Cases
ByteDance Terminal Technology
ByteDance Terminal Technology
Jul 29, 2022 · Artificial Intelligence

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

Pitaya, built by ByteDance’s Client AI and MLX teams, is a comprehensive end‑side AI engineering platform that provides a full workflow from model development and data preparation to deployment, monitoring, and federated learning, supporting large‑scale commercial scenarios across multiple apps.

AI PlatformFederated LearningInference Engine
0 likes · 14 min read
Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview
DaTaobao Tech
DaTaobao Tech
Jul 15, 2022 · Artificial Intelligence

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

The article demonstrates how to evaluate, compress, and convert deep‑learning models for edge devices using TensorFlow, JAX, and TVM—showing a faster iPhone‑based MNIST training benchmark, FLOPs measurement scripts, TFLite/ONNX/CoreML conversion, TVM compilation with auto‑tuning, and up to 50 % speed improvements on mobile NPU hardware.

JAXTVMTensorFlow
0 likes · 29 min read
Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM
Alibaba Terminal Technology
Alibaba Terminal Technology
Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM
0 likes · 29 min read
How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization
ITPUB
ITPUB
Jun 20, 2022 · Artificial Intelligence

Edge AI Boosts Mobile Search Ranking: Inside Meituan’s On‑Device Re‑ranking

This article details Meituan’s implementation of on‑device deep learning models for search re‑ranking, covering the motivations for edge intelligence, feature engineering, feedback sequence modeling, model architecture, deployment optimizations, experimental results, and future directions, offering practical insights for developers building large‑scale AI on mobile.

edge AIfeature engineeringmobile deep learning
0 likes · 28 min read
Edge AI Boosts Mobile Search Ranking: Inside Meituan’s On‑Device Re‑ranking
Meituan Technology Team
Meituan Technology Team
Jun 16, 2022 · Artificial Intelligence

Edge AI Re‑ranking in Meituan/Dianping Search: Architecture, Algorithms, and Deployment

Meituan/Dianping’s edge‑AI re‑ranking system moves large‑scale deep‑learning models onto users’ devices, using dense networks and cloud‑served embeddings, advanced feedback‑sequence and multi‑view attention models, and aggressive compression to deliver real‑time, privacy‑preserving search personalization that boosts click‑through rates by up to 0.43 %.

Model Deploymentedge AImobile search
0 likes · 25 min read
Edge AI Re‑ranking in Meituan/Dianping Search: Architecture, Algorithms, and Deployment
Code DAO
Code DAO
May 5, 2022 · Artificial Intelligence

Optimizing Machine Learning Models for Edge Devices with TensorFlow Lite

This article explains how to convert a TensorFlow image‑classification model to TensorFlow Lite, apply different quantization techniques, benchmark the resulting models on a Raspberry Pi 4, and compare latency, size, and accuracy to demonstrate the trade‑offs of edge AI deployment.

EfficientNetModel QuantizationPython
0 likes · 16 min read
Optimizing Machine Learning Models for Edge Devices with TensorFlow Lite
Baidu Geek Talk
Baidu Geek Talk
Apr 1, 2022 · Artificial Intelligence

How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance

With the rapid rise of edge computing, deploying AI models for tasks like object detection, OCR, and speech recognition on resource‑constrained devices faces speed challenges; the upgraded Paddle Lite inference engine and PaddleSlim compression tools claim up to 23% faster inference and significant model size reductions, offering a practical solution.

AI deploymentInference OptimizationPaddle-Lite
0 likes · 6 min read
How Paddle Lite & PaddleSlim Supercharge Edge AI Inference Performance
Alibaba Terminal Technology
Alibaba Terminal Technology
Mar 9, 2022 · Artificial Intelligence

How Edge AI Powers Alibaba’s Local Life Services: Architecture and Real‑World Wins

This article explains how Alibaba’s local‑life platforms leverage edge‑side AI to run machine‑learning inference on users’ devices, detailing the concept, advantages, technical architecture, and concrete implementations such as user feature extraction, intelligent recommendation, and smart push, while outlining future directions.

AlibabaMobile AIedge AI
0 likes · 12 min read
How Edge AI Powers Alibaba’s Local Life Services: Architecture and Real‑World Wins
DaTaobao Tech
DaTaobao Tech
Feb 17, 2022 · Artificial Intelligence

Unifying Edge AI Training and Deployment: Inside MNN Workbench’s New Workflow

The article outlines how MNN Workbench, Alibaba’s open‑source edge‑AI platform, integrates professional training capabilities, cloud‑based PAI‑DLC resources, multi‑window debugging, and visual Git Flow to streamline end‑to‑end model development, deployment, and iteration for developers of varying expertise.

DeploymentGit FlowMNN
0 likes · 10 min read
Unifying Edge AI Training and Deployment: Inside MNN Workbench’s New Workflow
Alipay Experience Technology
Alipay Experience Technology
Feb 10, 2022 · Frontend Development

How Ant Group Supercharged Front‑End AI with Cross‑Platform Smart Apps

This talk explains how Ant Group’s frontend engineers built edge‑AI services that run directly in browsers, boosting real‑time performance, preserving privacy, and cutting cloud costs, while showcasing two real‑world cases—pet identification and screen‑break insurance—and detailing the WebGL‑based engine optimizations that lifted device coverage from 30% to 93%.

AI inferenceWebGLcross‑platform
0 likes · 8 min read
How Ant Group Supercharged Front‑End AI with Cross‑Platform Smart Apps
Alibaba Terminal Technology
Alibaba Terminal Technology
Dec 15, 2021 · Artificial Intelligence

Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI

Ant’s self‑developed xNN‑OCR demonstrates how advanced OCR can run offline on smartphones by combining GAN‑based data synthesis, lightweight ShuffleNet‑inspired detection, NAS‑optimized recognition, and aggressive model compression, delivering near‑real‑time accuracy for diverse mobile scenarios while preserving privacy and low cost.

NASdata synthesisedge AI
0 likes · 11 min read
Unlock Real-Time Mobile OCR: Inside Ant’s xNN-OCR Engine and Its Tiny, Fast AI
Aotu Lab
Aotu Lab
Sep 30, 2021 · Artificial Intelligence

Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression

This article explains how AI is extending into front‑end development, defines edge AI, outlines its application scenarios, discusses advantages and limitations, reviews web‑based inference frameworks and hardware acceleration, and details model compression techniques for deploying AI directly in browsers.

TensorFlow.jsWebai
0 likes · 15 min read
Bringing AI to the Browser: Edge Intelligence, Frameworks & Model Compression
Amap Tech
Amap Tech
Jun 4, 2021 · Artificial Intelligence

Deploying Multiple CNN Models on Low‑End Devices with MNN: Memory Tricks and Error Debugging

This article explains how a high‑traffic map service captures road features using client‑side computer‑vision models, details the deployment of many CNNs with the lightweight MNN engine on memory‑constrained devices, and shares practical memory‑saving techniques, inference scheduling, and error‑analysis methods.

AndroidComputer VisionMNN
0 likes · 12 min read
Deploying Multiple CNN Models on Low‑End Devices with MNN: Memory Tricks and Error Debugging
Alibaba Cloud Native
Alibaba Cloud Native
Feb 24, 2021 · Cloud Native

How OpenYurt Bridges Cloud‑Native and Edge Computing: Architecture, Trends, and Real‑World Cases

This article explains the rise of edge computing, outlines its layered architecture, examines industry trends, describes cloud‑native fundamentals, and details how the OpenYurt platform solves integration challenges with features like unitization, edge autonomy, seamless conversion, and cloud‑edge collaboration, illustrated by edge‑AI and video‑to‑cloud case studies.

KubernetesOpenYurtedge AI
0 likes · 19 min read
How OpenYurt Bridges Cloud‑Native and Edge Computing: Architecture, Trends, and Real‑World Cases
Didi Tech
Didi Tech
Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMDeep Learningedge AI
0 likes · 16 min read
Deep Model Compression Techniques for Intelligent Automotive Cockpits
Baidu App Technology
Baidu App Technology
May 29, 2020 · Mobile Development

How MML Simplifies Mobile AI Deployment: Architecture, Tools, and Code Walkthrough

This article explains the background of on‑device AI, introduces the Mobile Machine Learning (MML) framework and its layered architecture, details the core utilities such as model decryption and task scheduling, and provides a step‑by‑step code guide for initializing, preprocessing, inference, post‑processing, and releasing resources on mobile platforms.

AndroidMMLMobile AI
0 likes · 9 min read
How MML Simplifies Mobile AI Deployment: Architecture, Tools, and Code Walkthrough
Baidu App Technology
Baidu App Technology
May 20, 2020 · Frontend Development

Paddle.js: Baidu's Browser-Based AI Inference Engine for Frontend Development

Paddle.js is Baidu’s lightweight JavaScript inference engine that transforms Paddle models into web‑compatible formats, enabling fast, privacy‑preserving AI features such as face detection, gesture recognition and content filtering directly in browsers via WebGL/WebAssembly, with only 201 KB code and broad compatibility.

Frontend IntelligenceJavaScript AINeural Network Inference
0 likes · 13 min read
Paddle.js: Baidu's Browser-Based AI Inference Engine for Frontend Development
Tencent Cloud Developer
Tencent Cloud Developer
Aug 6, 2019 · Cloud Computing

Tencent Cloud AIoT Product: Edge AI Capabilities and Cloud-Edge Collaboration Architecture

Tencent Cloud’s AIoT solution combines edge AI processing with a cloud‑edge collaboration framework, using container‑orchestrated microservices, AI chips and IoT connectivity to cut latency to milliseconds, lower bandwidth by sending only structured data, and enable real‑time applications such as smart retail, manufacturing, agriculture and building security.

AIoTDeep LearningEdge Computing
0 likes · 28 min read
Tencent Cloud AIoT Product: Edge AI Capabilities and Cloud-Edge Collaboration Architecture
Alibaba Cloud Developer
Alibaba Cloud Developer
May 7, 2019 · Artificial Intelligence

What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?

Alibaba’s open‑source MNN is a lightweight, high‑performance deep‑learning inference engine optimized for edge devices, supporting multiple model formats and backends, offering portability across iOS, Android, and IoT, with detailed architecture, performance benchmarks, roadmap, and real‑world application examples.

Deep LearningMNNedge AI
0 likes · 12 min read
What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2019 · Mobile Development

How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices

This article explains how the lightweight xNN-OCR engine achieves high accuracy and real‑time performance on mobile devices through deep‑learning model compression, novel detection and recognition techniques, and showcases its practical applications such as bank‑card, gas‑meter, license‑plate, and ID recognition.

Deep Learningedge AImobile OCR
0 likes · 12 min read
How xNN-OCR Brings High‑Precision, Real‑Time OCR to Mobile Devices
Architects' Tech Alliance
Architects' Tech Alliance
Jan 30, 2019 · Industry Insights

Breaking the Storage Wall: How In‑Memory Computing Is Shaping AI Chip Design

The article analyzes the growing bottlenecks in compute architecture and memory, explores high‑bandwidth communication, near‑data processing, and in‑memory computing techniques, evaluates their advantages, challenges, and future prospects, and highlights key industry players driving the shift toward integrated compute‑storage chips.

AI chipsCompute Architectureedge AI
0 likes · 14 min read
Breaking the Storage Wall: How In‑Memory Computing Is Shaping AI Chip Design
Youku Technology
Youku Technology
Nov 5, 2018 · Artificial Intelligence

Intelligent Interactive Practices for Multimedia Live Streaming: Insights from Taobao Live

The talk outlines Taobao Live’s rapid growth and three‑layer interactive architecture—dynamic AI‑driven marketing tools, human‑computer interaction features such as facial and gesture recognition, and intelligent operations that score fan intimacy—to deliver low‑latency, AI‑enhanced streaming with innovations like virtual backgrounds, product recognition, and an automated live‑assistant.

edge AIe‑commerceinteractive AI
0 likes · 9 min read
Intelligent Interactive Practices for Multimedia Live Streaming: Insights from Taobao Live
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 28, 2017 · Artificial Intelligence

Inside Alibaba AI Lab: Dr. Wang Gang on Multimodal AI and Edge Computing

In an exclusive interview, Alibaba AI Lab's distinguished scientist Dr. Wang Gang discusses the lab's research on multimodal AI, edge computing, AI hardware, bio‑inspired cognition, quantum‑deep‑learning integration, and the challenges of moving from recognition to true understanding, while also outlining Alibaba's AI talent recruitment plans.

AI researchAI talent recruitmentComputer Vision
0 likes · 25 min read
Inside Alibaba AI Lab: Dr. Wang Gang on Multimodal AI and Edge Computing