How to Deploy DeepSeek‑OCR‑2 Locally: A Hands‑On Walkthrough
The article details a step‑by‑step local deployment of DeepSeek‑OCR‑2, covering GPU memory requirements, accuracy on complex tables, long inference times, dependency hurdles like GCC, GLIBC and flash‑attn, and provides concrete solutions using conda environments and symlinks.
Hello, I am Ai Learning's Lao Zhang. In previous posts I introduced DeepSeek‑OCR‑2 and discussed earlier OCR projects such as DeepSeek‑OCR, HunyuanOCR, and PaddleOCR.
The deployment proved difficult because vLLM does not yet support the model, requiring many transformer dependencies, a specific GCC version, and a low‑level GLIBC version. I tried creating a conda environment with GCC 11, but GLIBC could not be upgraded, so I ultimately compiled the necessary libraries locally, which took time but succeeded.
Key concerns before the detailed guide:
GPU memory: at least 8.5 GB, with actual OCR tasks reaching about 10 GB.
Accuracy: performs well on challenging tables with nested headers, merged cells, and background noise.
Inference speed: over 20 seconds on a RTX 4090; online demos run faster, possibly because they use H200 hardware.
Online demos (recommended instead of local deployment)
https://deepseek-ocr-v2-demo.vercel.app/
https://huggingface.co/spaces/prithivMLmods/DeepSeek-OCR-2-Demo
If you still want to deploy locally, I followed the official DeepSeek‑AI documentation ( https://github.com/deepseek-ai/DeepSeek-OCR-2/tree/main) and encountered a low GCC version. Using conda I created an environment with GCC 11:
conda create -n gcc11_env -c conda-forge gcc_linux-64=11 gxx_linux-64=11
conda activate gcc11_env
x86_64-conda-linux-gnu-gcc --versionInstallation screenshots (omitted) show the process, followed by inference steps. I also tried the Unsloth guide ( https://unsloth.ai/docs/models/deepseek-ocr-2) which describes a nightly vLLM build and a fine‑tuning notebook for Arabic recognition, but I could not get it running.
The most troublesome part was requirements.txt, especially the flash‑attn dependency. Direct whl installation failed due to old GCC and GLIBC. The solutions were:
Use conda to create a GCC 11 environment (as shown above).
Install flash‑attn with no build isolation: pip install flash-attn==2.7.3 --no-build-isolation.
After fixing the dependencies, run the demo with:
python deepseek_ocr_2_demo.py --server 0.0.0.0:7860To make the GCC command directly usable, create symlinks that map the long conda compiler names to the short gcc and g++ names:
ln -sf $CONDA_PREFIX/bin/x86_64-conda-linux-gnu-gcc $CONDA_PREFIX/bin/gcc
ln -sf $CONDA_PREFIX/bin/x86_64-conda-linux-gnu-g++ $CONDA_PREFIX/bin/g++Verify with gcc --version. For scripts that read CC and CXX variables, export them:
export CC=x86_64-conda-linux-gnu-gcc
export CXX=x86_64-conda-linux-gnu-g++Optionally add these exports to ~/.bashrc for permanence.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
