Run Full AI Models Directly in the Browser with Transformers.js v4

Transformers.js v4 rewrites its WebGPU runtime in C++ and compiles to WASM, delivering ten‑fold faster build times, 10% smaller bundles, and up to four‑fold speedups for BERT‑style models, while supporting over 20 new architectures such as Qwen3.5 and enabling offline, privacy‑preserving AI inference directly in the browser.

Transformers.jsWasmWebGPU

0 likes · 8 min read

Run Full AI Models Directly in the Browser with Transformers.js v4

Open Source Linux

Apr 14, 2025 · Artificial Intelligence

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

This guide compares DeepSeek’s local and online versions, outlines hardware and privacy advantages of offline deployment, and provides a detailed step‑by‑step tutorial—including Ollama installation, model selection, command execution, and UI plugin setup—to help users run DeepSeek on their own machines.

AI modelDeepSeekOllama

0 likes · 6 min read

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

21CTO

Apr 24, 2024 · Artificial Intelligence

Microsoft’s Phi‑3 Mini: The Smallest LLM That Beats GPT‑3.5 on iPhone

Microsoft unveiled the open‑source Phi‑3 series, a lightweight family of large language models that outperform larger rivals, run offline on smartphones, and cost a fraction of comparable AI models, opening new possibilities for edge and mobile AI applications.

LLMPhi-3offline-inference

0 likes · 8 min read

Microsoft’s Phi‑3 Mini: The Smallest LLM That Beats GPT‑3.5 on iPhone

Volcano Engine Developer Services

Jun 20, 2023 · Artificial Intelligence

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

Large-model offline (batch) inference, which processes massive data on billion-parameter models, faces GPU memory and distributed scheduling challenges; this article explains how Ray's cloud-native framework, model parallelism, and Ray Datasets pipelines address these issues, improve throughput, and enable elastic, efficient GPU utilization.

GPU utilizationRaycloud-native

0 likes · 16 min read

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

ByteDance Cloud Native

Jun 13, 2023 · Artificial Intelligence

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

This article explains the challenges of large‑model offline (batch) inference, such as GPU memory limits and distributed scheduling, and shows how Ray’s cloud‑native architecture, model partitioning, and Ray Datasets can be used to build efficient, elastic inference frameworks deployed with KubeRay.

GPU MemoryLarge ModelRay

0 likes · 18 min read

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

Ctrip Technology

Jan 5, 2017 · Artificial Intelligence

Practical Approaches to Deploying Machine Learning Models: PMML, Rserve, and Spark in Production

This article shares practical engineering experiences for deploying machine learning models in production, covering three typical scenarios—real‑time small data, real‑time large data, and offline predictions—and detailing how to use PMML, Rserve, Spark, shell scripts, and related tools to meet performance and operational requirements.

Model DeploymentPMMLRserve

0 likes · 12 min read

Practical Approaches to Deploying Machine Learning Models: PMML, Rserve, and Spark in Production