Tagged articles
3 articles
Page 1 of 1
Tech Musings
Tech Musings
Mar 6, 2026 · Artificial Intelligence

How to Deploy Qwen3-8B on WSL2 with 4‑Bit Quantization and Resource Limits

This article details a step‑by‑step guide for setting up the Qwen3‑8B large language model on a Windows 11 system using WSL2, covering hardware specs, CUDA configuration, 4‑bit quantization with BitsAndBytes, SDPA attention optimization, CPU offload, and resource‑limiting tricks to achieve smooth inference performance.

4-bit quantizationCUDA optimizationPyTorch
0 likes · 10 min read
How to Deploy Qwen3-8B on WSL2 with 4‑Bit Quantization and Resource Limits
Refining Core Development Skills
Refining Core Development Skills
Aug 26, 2025 · Fundamentals

How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained

Fermi, NVIDIA’s 2010 GPU architecture, introduced major upgrades over the Tesla line—including a 40 nm process, vastly increased transistor count, GDDR5 memory, L2 cache, enhanced FP64 performance, ECC support, and unified CPU‑GPU addressing—making it the first truly complete GPU computing platform.

CUDA optimizationECC MemoryFP64 performance
0 likes · 12 min read
How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained
DataFunSummit
DataFunSummit
Jul 4, 2023 · Artificial Intelligence

PPL: A Full‑Platform Deep Learning Deployment Framework by SenseTime

The article presents SenseTime's PPL framework, detailing its toolchain, inference engine, multi‑backend operator library, quantization tools, CUDA optimizations, performance benchmarks across CPUs, GPUs, DSPs and DSAs, and outlines future plans for broader chip support and AI for Science.

AI inferenceCUDA optimizationDeep Learning Deployment
0 likes · 23 min read
PPL: A Full‑Platform Deep Learning Deployment Framework by SenseTime