Tagged articles

Metal

17 articles · Page 1 of 1

Jun 6, 2026 · Artificial Intelligence

How a 400B MoE Model Runs on iPhone 17 Pro with Flash‑MoE

The article details how the open‑source Flash‑MoE engine enables the 400B‑parameter Qwen3.5‑397B‑A17B mixture‑of‑experts model to run on an iPhone 17 Pro, achieving about 0.6 tokens per second through a custom Metal pipeline, GCD‑driven SSD streaming, and aggressive caching strategies.

400BFlash-MoELLM Inference

0 likes · 6 min read

How a 400B MoE Model Runs on iPhone 17 Pro with Flash‑MoE

Java Architect Essentials

May 29, 2026 · Artificial Intelligence

How Redis Creator Built a Metal‑Only Engine to Run DeepSeek V4 Flash at Full Speed on Mac

The ds4.c project, authored by Redis founder Salvatore Sanfilippo, is a Metal‑only C inference engine that uses asymmetric 2‑bit quantization, disk‑based KV caching, and OpenAI/Anthropic‑compatible APIs to achieve usable performance for DeepSeek V4 Flash on high‑end Apple Silicon Macs.

Apple SiliconC#DS4

0 likes · 9 min read

How Redis Creator Built a Metal‑Only Engine to Run DeepSeek V4 Flash at Full Speed on Mac

Geek Labs

May 13, 2026 · Artificial Intelligence

Two LLM Inference Acceleration Projects: A Mac‑Local Engine vs a Data‑Center Engine

This article compares two recent GitHub LLM inference engines—ds4.c, a Metal‑optimized engine for DeepSeek V4 Flash on Apple Silicon Macs, and TokenSpeed, a Python/C++‑based, data‑center‑grade engine for GPU clusters—detailing their design choices, performance numbers, usage instructions, and suitable scenarios.

DeepSeekGPULLM

0 likes · 8 min read

Two LLM Inference Acceleration Projects: A Mac‑Local Engine vs a Data‑Center Engine

Lao Guo's Learning Space

May 11, 2026 · Artificial Intelligence

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.

AIC#DeepSeek

0 likes · 8 min read

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Node.js Tech Stack

May 9, 2026 · Artificial Intelligence

Redis Founder Crafts DeepSeek V4 AI Inference Engine, Node.js Star Applauds

Redis creator Salvatore Sanfilippo (antirez) released DS4, a Metal‑only C inference engine tailored for DeepSeek V4 Flash on high‑end Macs, featuring narrow model focus, 2‑bit quantization, disk‑based KV cache, benchmark speeds around 26 tokens/s, and a dual OpenAI/Anthropic compatible server.

2-bit quantizationAI inference engineDeepSeek-V4

0 likes · 13 min read

Redis Founder Crafts DeepSeek V4 AI Inference Engine, Node.js Star Applauds

Machine Learning Algorithms & Natural Language Processing

May 3, 2026 · Artificial Intelligence

Running a 400B Mixture‑of‑Experts LLM on iPhone 17 Pro: Inside Flash‑MoE

The article details how the open‑source Flash‑MoE engine streams a 400‑billion‑parameter Mixture‑of‑Experts language model on an iPhone 17 Pro, achieving interactive‑level token throughput by eliminating Python dependencies, crafting a custom Metal pipeline, and streaming weights directly from SSD.

Apple SiliconFlash-MoEGCD

0 likes · 7 min read

Running a 400B Mixture‑of‑Experts LLM on iPhone 17 Pro: Inside Flash‑MoE

Machine Heart

May 1, 2026 · Artificial Intelligence

How a 400B Mixture‑of‑Experts Model Runs on the iPhone 17 Pro

The article details the Flash‑MoE project that streams the 400 billion‑parameter Qwen3.5‑397B‑A17B mixture‑of‑experts model on an iPhone 17 Pro, achieving up to 0.6 tokens per second with a custom Metal‑GPU pipeline, zero‑Python code, and SSD‑backed weight streaming that keeps only 5.5 GB in RAM.

Flash-MoELLMMetal

0 likes · 7 min read

How a 400B Mixture‑of‑Experts Model Runs on the iPhone 17 Pro

ByteDance Terminal Technology

Aug 24, 2022 · Mobile Development

Impeller Rendering Engine: Background, Metal Shader Compilation, Vector Rendering, and Flutter DisplayList

This article provides an in‑depth technical overview of Flutter's Impeller rendering engine, covering its origin, Jank classification, Metal shader compilation evolution, vector rendering fundamentals, DisplayList architecture, Impeller's rendering pipeline, and the ImpellerC shader compiler, with code examples and performance insights.

DisplayListFlutterImpeller

0 likes · 31 min read

Impeller Rendering Engine: Background, Metal Shader Compilation, Vector Rendering, and Flutter DisplayList

Youku Technology

Jun 8, 2022 · Mobile Development

How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile

This article details Youku's technical approach to rendering bullet‑screen pass‑through on mobile devices, covering cloud‑based and on‑device segmentation pipelines, GPU‑accelerated rendering steps, performance optimizations, and engineering challenges to deliver seamless immersive viewing.

Bullet ScreenGPUMetal

0 likes · 11 min read

How Youku Achieves Real-Time Bullet‑Screen Pass‑Through on Mobile

Youzan Coder

Dec 3, 2021 · Mobile Development

Adaptive‑Sync and ProMotion Variable Refresh Rate Techniques on macOS, iPad Pro, and iPhone 13 Pro

Apple’s WWDC21‑introduced Adaptive‑Sync on macOS and ProMotion on iPad Pro and iPhone 13 Pro enable variable‑refresh‑rate displays, and developers can use Metal and CADisplayLink APIs to dynamically pace frames, query hardware limits, and adjust rates based on GPU load for smoother motion.

Adaptive SyncCADisplayLinkMetal

0 likes · 18 min read

Adaptive‑Sync and ProMotion Variable Refresh Rate Techniques on macOS, iPad Pro, and iPhone 13 Pro

NetEase Smart Enterprise Tech+

Nov 9, 2021 · Fundamentals

Cross‑Platform WebRTC Video Rendering: OpenGL, Metal, Vulkan & Direct3D

This article provides a comprehensive overview of WebRTC video rendering across major platforms, detailing the pipelines and rendering technologies such as OpenGL, Metal, Vulkan, and Direct3D, and explains how these APIs are employed on iOS, Android, macOS, and Windows for efficient real‑time video playback.

Direct3DMetalOpenGL

0 likes · 15 min read

Cross‑Platform WebRTC Video Rendering: OpenGL, Metal, Vulkan & Direct3D

Xianyu Technology

Oct 21, 2021 · Mobile Development

Flutter iOS GPU Background Crash Analysis and Solution

The article analyzes why Flutter crashes on iOS when accessing the GPU in the background, explains the official SyncSwitch fix for ImageDecoder, and details Xianyu’s additional patches for MultipleFrameCodec, EncodeImage, and Rasterizer::DrawToSurface that together, via PR #28383, fully resolve the GPU‑background crash.

CrashFlutterGPU

0 likes · 11 min read

Flutter iOS GPU Background Crash Analysis and Solution

Sohu Tech Products

Nov 18, 2020 · Game Development

Best Practices for Metal on Apple Silicon: Architecture Migration, GPU Changes, and Optimization Techniques

This article explains how Apple Silicon affects Metal applications, outlines migration steps from Intel to Apple Silicon, describes new GPU architectures and API features, and provides practical best‑practice guidelines to achieve optimal performance and correctness on the new platform.

Apple SiliconGPUMetal

0 likes · 11 min read

Best Practices for Metal on Apple Silicon: Architecture Migration, GPU Changes, and Optimization Techniques

Kuaishou Large Model

Oct 22, 2020 · Fundamentals

Why Deferred Rendering Beats Forward Rendering on Mobile GPUs – A Deep Dive

This article compares forward and deferred rendering techniques, analyzes their performance trade‑offs on mobile GPUs, explores tile‑based and hardware‑TBDR approaches, and presents a Metal‑based single‑pass deferred shading solution for modern mobile graphics pipelines.

GraphicsMetalMobile GPU

0 likes · 15 min read

Why Deferred Rendering Beats Forward Rendering on Mobile GPUs – A Deep Dive

Baidu Maps Tech Team

Jul 1, 2020 · Mobile Development

Baidu Maps’ Mobile Rendering Engine: Upgrading to Metal for 40% CPU Gains

This article traces the evolution of Baidu Maps' mobile rendering engine, explains why Metal outperforms OpenGL, outlines a cross‑platform engine redesign, and quantifies the CPU, GPU, and memory improvements achieved after the Metal upgrade.

Baidu MapsGraphics APIMetal

0 likes · 11 min read

Baidu Maps’ Mobile Rendering Engine: Upgrading to Metal for 40% CPU Gains

Taobao Frontend Technology

Dec 2, 2019 · Frontend Development

How GCanvas Boosts Cross‑Platform Graphics Performance for Front‑End Developers

GCanvas is a lightweight, W3C‑compliant graphics engine that evolved from H5 acceleration to server‑side rendering, offering front‑end developers high‑performance canvas rendering, extensive platform support, and detailed optimization techniques such as JSBinding and Metal integration.

GCanvasGraphics RenderingJSBinding

0 likes · 9 min read

How GCanvas Boosts Cross‑Platform Graphics Performance for Front‑End Developers

Yuewen Technology

Sep 15, 2017 · Mobile Development

Unlocking ARKit: How Apple’s AR Framework Powers iOS 11 Augmented Reality

This article explains ARKit’s architecture, core tracking components, scene‑recognition features such as plane detection, hit‑testing and light estimation, and provides a step‑by‑step demo using Xcode templates to render a 3D airplane model on iOS devices.

ARKitMetalSceneKit

0 likes · 10 min read

Unlocking ARKit: How Apple’s AR Framework Powers iOS 11 Augmented Reality