Running Large Language Models Locally on RTX 3090: Two Open‑Source Solutions
This article introduces two recent GitHub projects—club‑3090, which enables single‑ or dual‑RTX 3090 inference of 27‑billion‑parameter models with detailed performance benchmarks, and library‑skills, a tool that keeps AI agents synchronized with the latest official library APIs—explaining their configurations, usage steps, hardware requirements, and target audiences.
This article presents two noteworthy open‑source projects on GitHub that address practical challenges in AI development.
club‑3090: Running large models on RTX 3090
Modern large language models such as Qwen‑3.6‑27B exceed the 24 GB VRAM of a single RTX 3090, leading to out‑of‑memory (OOM) errors or unusably slow performance. club‑3090 documents how to run such models on one or two consumer‑grade RTX 3090 GPUs, outlining compatible configurations, memory requirements, and suitable inference engines.
Two technical routes
vLLM dual‑card scheme – aims for maximum throughput. Using two 3090 cards can reach up to 127 TPS in code‑generation scenarios and supports 262 K token context, visual understanding, tool calls, and Multi‑Token Prediction (MTP).
llama.cpp single‑card scheme – prioritises stability. A single 3090 runs a 262 K token context without prefill cliffs, returns 25 K‑token tool results reliably, and achieves about 21 TPS, making it suitable for long‑running agent tasks.
Both schemes are provided as ready‑to‑use Docker Compose configurations exposing an OpenAI‑compatible API at localhost:8020.
Measured performance
vLLM dual (TP=2): ~46 GB VRAM, 89 TPS, high throughput with 262 K context.
vLLM dual Turbo: ~46 GB VRAM, 127 TPS, highest code‑generation speed.
llama.cpp single: ~22 GB VRAM, 21 TPS, best single‑card stability.
The single‑card setup may encounter OOM when the context exceeds 50 K tokens; in such cases the dual‑card vLLM route or switching to the llama.cpp configuration is recommended.
Who should use it
Developers with only consumer‑grade GPUs who want to run large models locally.
Teams building AI applications that need a stable backend service without repeatedly calling external APIs.
Users concerned about data privacy who prefer fully offline execution.
git clone https://github.com/noonghunna/club-3090.git
cd club-3090
# Download the model (~20 GB)
bash scripts/setup.sh qwen3.6-27b
# Interactive launch (choose engine and config)
bash scripts/launch.sh
# Or launch directly with a specific variant
bash scripts/launch.sh --variant vllm/dualHardware requirements: 1–2 RTX 3090 (24 GB VRAM), Linux (Ubuntu 22.04+), Docker + NVIDIA Container Toolkit, driver 580.x+, and roughly 30 GB free disk space.
GitHub: https://github.com/noonghunna/club-3090
library‑skills: Keeping AI agents up‑to‑date with official library usage
AI coding assistants often suffer from stale training data, causing them to suggest outdated or even non‑existent APIs. library‑skills solves this by letting library authors supply “skill” files that describe the latest API usage, best practices, and deprecated patterns.
How to use
# Python
uvx library-skills
# Or JavaScript/TypeScript
npx library-skillsThe tool scans the current project's dependencies, detects libraries that have associated skill files, and installs them as symbolic links in a .agents directory, automatically staying in sync with library version updates. Adding the --claude flag also installs the skills into a .claude/skills directory for Claude Code users.
Problems it solves
Data staleness : AI models are trained on data with a cutoff date; skill files provide the freshest API information.
Hallucinated code : AI may infer incorrect usage from outdated examples; official skill files directly convey the correct usage.
Currently supported libraries include FastAPI and Streamlit, with more to be added.
GitHub: https://github.com/tiangolo/library-skills
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Geek Labs
Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
