Collection size
100 articles
Page 3 of 5
Ops Community
Ops Community
Jan 18, 2026 · Artificial Intelligence

How to Quadruple LLM Throughput with vLLM’s PagedAttention and Continuous Batching

This guide details how to replace native Transformers inference with the high‑performance vLLM engine, leveraging PagedAttention, continuous batching, tensor parallelism, and OpenAI‑compatible APIs to achieve 3‑4× higher throughput, lower latency, and scalable multi‑GPU deployments for production‑grade large language models.

Continuous batchingGPU OptimizationOpenAI API Compatibility
0 likes · 61 min read
How to Quadruple LLM Throughput with vLLM’s PagedAttention and Continuous Batching
Old Meng AI Explorer
Old Meng AI Explorer
Nov 27, 2025 · Artificial Intelligence

How UltraRAG Turns RAG Deployment into a Zero‑Code, One‑Click Process

UltraRAG, an open‑source RAG framework co‑developed by Tsinghua and NEUIR, offers a zero‑code WebUI that streamlines data construction, model fine‑tuning, and multi‑dimensional evaluation, boosting retrieval accuracy by up to 30% and cutting deployment time by half across legal, medical, and research use cases.

AIOpen-sourceRAG
0 likes · 11 min read
How UltraRAG Turns RAG Deployment into a Zero‑Code, One‑Click Process
HyperAI Super Neural
HyperAI Super Neural
Apr 24, 2026 · Artificial Intelligence

Qwen3.6-27B Packs Flagship-Level Coding Power in a Small Model – One-Click Deployment Tutorial

The 27‑billion‑parameter Qwen3.6-27B model outperforms previous open‑source flagships on multiple coding benchmarks, scores 87.8 on GPQA Diamond, supports multimodal reasoning, and is available through HyperAI's one‑click deployment tutorial with free GPU compute resources.

GPU ComputeOne‑Click DeploymentQwen3.6-27B
0 likes · 4 min read
Qwen3.6-27B Packs Flagship-Level Coding Power in a Small Model – One-Click Deployment Tutorial
Sohu Tech Products
Sohu Tech Products
Jan 21, 2026 · Artificial Intelligence

Building an AI Knowledge Management System with Claude Skills & Dynamic Routing

This article explains how to design and implement a knowledge‑management and intelligent‑assistant system called Krawl using Claude Skills, covering the three‑layer skill architecture, progressive disclosure, dynamic routing, lazy loading, meta‑tool integration, and concrete Python examples for video summarisation and knowledge queries.

AIClaudeDynamic Routing
0 likes · 19 min read
Building an AI Knowledge Management System with Claude Skills & Dynamic Routing
Baidu Geek Talk
Baidu Geek Talk
Apr 2, 2025 · Artificial Intelligence

DeepSeek-VL2 Multimodal Model: Architecture, Training, and Code Walkthrough

DeepSeek‑VL2 is a state‑of‑the‑art multimodal model built on a Mixture‑of‑Experts architecture that combines a SigLIP‑L vision encoder with dynamic tiling, a two‑layer VL adaptor, and a DeepSeek‑MoE language model using Multi‑head Latent Attention, trained in three stages on diverse visual‑language and text data, and achieving strong results on benchmarks such as DocVQA and TextVQA, with full implementation and inference code available in PaddleMIX.

DeepSeek-VL2InferenceMixture of Experts
0 likes · 36 min read
DeepSeek-VL2 Multimodal Model: Architecture, Training, and Code Walkthrough
Old Meng AI Explorer
Old Meng AI Explorer
Jan 8, 2026 · Artificial Intelligence

How UltraRAG Turns RAG Development into a Zero‑Code, One‑Click Process

UltraRAG, an open‑source RAG framework from THUNLP and NEUIR, offers a zero‑code WebUI that streamlines data construction, model fine‑tuning, and multi‑dimensional evaluation, boosting retrieval accuracy by up to 30% and cutting deployment time in half for enterprise, AI developers, and researchers.

AIOpen-sourceRAG
0 likes · 10 min read
How UltraRAG Turns RAG Development into a Zero‑Code, One‑Click Process
BirdNest Tech Talk
BirdNest Tech Talk
Oct 30, 2025 · Artificial Intelligence

How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide

Learn how LangChain enables multimodal interactions by preparing inputs, constructing prompts, invoking models like GPT‑4o, and processing responses, with a complete example that demonstrates image‑question answering, code walkthrough, environment setup, and key considerations for API keys and image URLs.

LLMLangChainOpenAI
0 likes · 9 min read
How to Build Multimodal Prompts with LangChain: A Step‑by‑Step Guide
Alibaba Cloud Native
Alibaba Cloud Native
Feb 12, 2025 · Artificial Intelligence

Boost AI Agents with Spring AI Alibaba: 20+ RAG Sources & Tool‑Calling Integrations

This article explains how Spring AI Alibaba enables AI agents to leverage Retrieval‑Augmented Generation and Tool Calling by providing over twenty ready‑made RAG data source connectors and more than twenty function‑calling interfaces, along with practical code examples for integrating document readers and weather services.

Document ReaderFunction CallingJava
0 likes · 12 min read
Boost AI Agents with Spring AI Alibaba: 20+ RAG Sources & Tool‑Calling Integrations
IT Services Circle
IT Services Circle
Oct 18, 2025 · Artificial Intelligence

Unlock Multi‑Model AI Collaboration with Zen MCP – A Deep Dive

The Zen MCP open‑source server, now with over 8.6K stars, acts as a bridge that lets Claude Code, Codex CLI, Gemini CLI and other AI tools invoke dozens of large models simultaneously, offering seamless multi‑model cooperation, automatic model selection, conversation continuity, and local execution for privacy‑preserving AI workflows.

AI OrchestrationAI toolingMulti-Model Collaboration
0 likes · 5 min read
Unlock Multi‑Model AI Collaboration with Zen MCP – A Deep Dive
AI Explorer
AI Explorer
Mar 11, 2026 · Artificial Intelligence

Gemini Embedding 2: Google’s First Native Multimodal Embedding Model

Google’s Gemini Embedding 2 introduces a native multimodal embedding model that maps text, images, video, audio, and documents into a single vector space, offers three configurable dimensions, achieves state‑of‑the‑art benchmarks across modalities, and enables cross‑modal search, RAG, and seamless integration with major vector databases.

AI ModelsGemini EmbeddingMatryoshka representation
0 likes · 8 min read
Gemini Embedding 2: Google’s First Native Multimodal Embedding Model
Architect's Must-Have
Architect's Must-Have
Apr 21, 2026 · Artificial Intelligence

30 Essential AI Agent Concepts: From LLMs to Multi‑Agent Systems

This comprehensive guide systematically explains thirty core terms of AI agents—covering foundational large language models, fine‑tuning techniques, multimodal vision‑language models, agent architectures such as ReAct and CoT, tool‑calling protocols, retrieval‑augmented generation, workflow orchestration, and emerging product forms like autonomous and embodied agents—while detailing the reasoning, trade‑offs, and concrete examples that shape modern agent engineering.

AI agentsRAGTool Calling
0 likes · 36 min read
30 Essential AI Agent Concepts: From LLMs to Multi‑Agent Systems
SpringMeng
SpringMeng
Mar 26, 2026 · Artificial Intelligence

Building a Dify‑Powered Multi‑Agent RAG AI Service with Chinese Large Models

After the New Year the author landed several AI contracts, delivering a six‑week knowledge‑base Q&A system and a two‑month AI customer‑service platform built with Dify, multi‑Agent workflows, RAG, and domestic large language models, cutting staff from fifteen to two and boosting development efficiency twofold.

AI Customer ServiceChinese LLMDify
0 likes · 7 min read
Building a Dify‑Powered Multi‑Agent RAG AI Service with Chinese Large Models
Data Thinking Notes
Data Thinking Notes
Sep 1, 2024 · Artificial Intelligence

Master LLMs: Basics, Prompt Engineering, RAG, Agents & Multimodal AI

This article provides a comprehensive overview of large language models, covering their fundamental concepts, historical milestones, parameter scaling, prompt engineering techniques, retrieval‑augmented generation, autonomous agents, and multimodal model applications, illustrating how these technologies reshape AI capabilities across domains.

AI agentsLLMRAG
0 likes · 22 min read
Master LLMs: Basics, Prompt Engineering, RAG, Agents & Multimodal AI
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 3, 2026 · Artificial Intelligence

Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent

The newly released Qwopus3.5‑v3 model combines higher‑quality reasoning chains, dedicated tool‑calling reinforcement learning, and an act‑then‑refine paradigm, delivering a 5‑point HumanEval boost, a 1.43‑point MMLU‑Pro gain, 31.7% faster inference and 24% lower token cost, while remaining runnable on a 3090 or a 16 GB MacBook, with easy deployment via GGUF, LM Studio, Ollama or llama.cpp.

Claude OpusDistillationHumanEval
0 likes · 12 min read
Qwopus3.5‑v3: From Reason‑Then‑Act to Act‑Then‑Refine – Claude‑Opus Distillation Turns Qwen3.5 into a Tool‑Using Agent
Ray's Galactic Tech
Ray's Galactic Tech
Mar 27, 2026 · Artificial Intelligence

Choosing Between LangChain4j and Spring AI: Which Java AI Framework Wins in Production?

This article provides a deep, production‑grade comparison of LangChain4j and Spring AI, examining their architectural philosophies, engineering governance, high‑concurrency design, code examples, and real‑world scenarios to help Java teams decide which framework best fits their AI system boundaries, team capabilities, and long‑term evolution goals.

Enterprise IntegrationJava AILangchain4j
0 likes · 29 min read
Choosing Between LangChain4j and Spring AI: Which Java AI Framework Wins in Production?
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Dec 13, 2025 · Artificial Intelligence

Explore 100+ Open‑Source LLM Apps and How to Run Them Locally

This guide presents a curated collection of over a hundred open‑source large language model applications—including AI agents, RAG pipelines, and domain‑specific tools—explains their categories, showcases example projects, and provides step‑by‑step instructions to clone and run them on your own machine.

AI agentsGitHubLLM
0 likes · 8 min read
Explore 100+ Open‑Source LLM Apps and How to Run Them Locally