Tagged articles
6 articles
Page 1 of 1
Node.js Tech Stack
Node.js Tech Stack
Apr 6, 2026 · Artificial Intelligence

Run Full AI Models Directly in the Browser with Transformers.js v4

Transformers.js v4 rewrites its WebGPU runtime in C++ and compiles to WASM, delivering ten‑fold faster build times, 10% smaller bundles, and up to four‑fold speedups for BERT‑style models, while supporting over 20 new architectures such as Qwen3.5 and enabling offline, privacy‑preserving AI inference directly in the browser.

Transformers.jsWasmWebGPU
0 likes · 8 min read
Run Full AI Models Directly in the Browser with Transformers.js v4
Open Source Linux
Open Source Linux
Apr 14, 2025 · Artificial Intelligence

How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI

This guide compares DeepSeek’s local and online versions, outlines hardware and privacy advantages of offline deployment, and provides a detailed step‑by‑step tutorial—including Ollama installation, model selection, command execution, and UI plugin setup—to help users run DeepSeek on their own machines.

AI modelDeepSeekOllama
0 likes · 6 min read
How to Deploy DeepSeek Locally: Step‑by‑Step Guide for Offline AI
21CTO
21CTO
Apr 24, 2024 · Artificial Intelligence

Microsoft’s Phi‑3 Mini: The Smallest LLM That Beats GPT‑3.5 on iPhone

Microsoft unveiled the open‑source Phi‑3 series, a lightweight family of large language models that outperform larger rivals, run offline on smartphones, and cost a fraction of comparable AI models, opening new possibilities for edge and mobile AI applications.

LLMPhi-3offline-inference
0 likes · 8 min read
Microsoft’s Phi‑3 Mini: The Smallest LLM That Beats GPT‑3.5 on iPhone
Volcano Engine Developer Services
Volcano Engine Developer Services
Jun 20, 2023 · Artificial Intelligence

Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture

Large-model offline (batch) inference, which processes massive data on billion-parameter models, faces GPU memory and distributed scheduling challenges; this article explains how Ray's cloud-native framework, model parallelism, and Ray Datasets pipelines address these issues, improve throughput, and enable elastic, efficient GPU utilization.

GPU utilizationRaycloud-native
0 likes · 16 min read
Boosting Large-Model Offline Inference with Ray and Cloud-Native Architecture
ByteDance Cloud Native
ByteDance Cloud Native
Jun 13, 2023 · Artificial Intelligence

How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference

This article explains the challenges of large‑model offline (batch) inference, such as GPU memory limits and distributed scheduling, and shows how Ray’s cloud‑native architecture, model partitioning, and Ray Datasets can be used to build efficient, elastic inference frameworks deployed with KubeRay.

GPU MemoryLarge ModelRay
0 likes · 18 min read
How Ray and Cloud‑Native Tech Supercharge Large‑Model Offline Inference
Ctrip Technology
Ctrip Technology
Jan 5, 2017 · Artificial Intelligence

Practical Approaches to Deploying Machine Learning Models: PMML, Rserve, and Spark in Production

This article shares practical engineering experiences for deploying machine learning models in production, covering three typical scenarios—real‑time small data, real‑time large data, and offline predictions—and detailing how to use PMML, Rserve, Spark, shell scripts, and related tools to meet performance and operational requirements.

Model DeploymentPMMLRserve
0 likes · 12 min read
Practical Approaches to Deploying Machine Learning Models: PMML, Rserve, and Spark in Production