Big Data 13 min read

How to Deploy Multi‑Version Python with Anaconda on a CDH Cluster and Build a Private PyPI Repository

Learn step‑by‑step how to install Anaconda parcels on a CDH cluster to create isolated Python 3 environments, set up a local Conda package repository with Nginx, use wget and bandersnatch to mirror packages, and configure private pip sources for Windows and Linux nodes.

dbaplus Community
dbaplus Community
dbaplus Community
How to Deploy Multi‑Version Python with Anaconda on a CDH Cluster and Build a Private PyPI Repository

Background

Python has become a dominant language for data science and AI. In a multi‑tenant CDH (Cloudera Distribution Hadoop) environment, different tenants often require different Python versions and scientific‑computing libraries. This guide shows how to deploy a Python 3 environment using an Anaconda parcel, create a private Conda repository, and expose it via a local PyPI server.

Prerequisites

At least 300 GB of storage on the host machine. wget for downloading packages. Nginx to serve the local repository.

1. Download the Anaconda Parcel

The Anaconda parcel can be obtained from the official archive:

https://repo.continuum.io/pkgs/misc/parcels/archive/

2. Extract the Parcel and Create a Private Python 3 Environment

Unpack the downloaded .parcel file and use the bundled installer to create a dedicated Python 3 environment. After extraction the directory structure looks like the screenshots below.

3. Install Private Conda Packages

Use the Tsinghua University mirror to download the required packages (the mirror may not have the newest versions but offers high speed):

https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

Because the internal machines cannot access the Internet, download the packages on a Windows host using wget and later copy them to the cluster.

Install wget on Windows (via Chocolatey)

C:\WINDOWS\system32> "%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"
C:\WINDOWS\system32> choco install wget

Upgrade later with:

C:\WINDOWS\system32> choco upgrade wget

Download Packages with wget

Create a text file ( pyku.txt) that lists all required package URLs, then run:

wget -b -c -i C:\conf\pyku.txt -p C:\conf\ku -f C:\conf\wget-log

4. Set Up a Local PyPI Server with Nginx

Install Nginx on Linux and configure it to serve the directory that holds the mirrored packages.

# Example nginx.conf snippets
server {
    listen 80;
    server_name localhost;
    root /usr/share/nginx/html/;
    index index.html index.htm;
}

location /pkgs/free {
    alias /usr/share/nginx/html/pkgs/free/;
    autoindex on;
    autoindex_exact_size on;
    autoindex_localtime on;
}

location /pkgs/free/linux-64 {
    alias /opt/beh/core/condaku/freeku/;
    autoindex on;
    autoindex_exact_size on;
    autoindex_localtime on;
}

After reloading Nginx, the repository can be accessed via URLs such as http://12.109.21.84/pkgs/free/.

5. Configure Conda to Use the Private Repository

Edit ~/.condarc (or /etc/conda/condarc) to add the local channels and disable the default channel:

conda config --add channels http://12.109.21.84/pkgs/free/
conda config --add channels http://12.109.21.84/pkgs/free/noarch/
conda config --set show_channel_urls yes
# Remove the line "- defaults" from the file and then:
source ~/.condarc

6. Install Packages via Conda

Install any required package with: conda install <em>package_name</em> If a package is not available in the private Conda channel, download the .whl file and place it under the pkgs directory, then install with pip:

pip install /path/to/package.whl

7. Package the Anaconda Parcel for Distribution

Create a tarball and generate a SHA‑1 checksum:

tar -zcvf Anaconda-5.0.1-el7.parcel Anaconda-5.0.1 --owner=root --group=root
sha1sum Anaconda-5.0.1-el7.parcel | cut -d ' ' -f 1 > Anaconda-5.0.1-el7.parcel.sha

Copy both the .parcel file and the .sha file to the Parcel‑Repo directory and let Cloudera Manager distribute them.

8. Build a Private pip Repository on Windows (Optional)

Install Anaconda on a Windows host, then use bandersnatch to mirror the public PyPI. pip install bandersnatch Create C:\etc\bandersnatch.conf (Windows has no /etc directory, so create an etc folder on the C: drive):

[mirror]
# Directory where the mirror data will be stored.
directory = /srv/pypi
# PyPI server to be mirrored.
master = https://pypi.python.org

Run the mirroring command: bandersnatch -c C:\etc\bandersnatch.conf mirror Configure Nginx (Windows build) to serve the mirrored packages:

server {
    listen *:80;
    server_name localhost;
    root F:\python_package\web;
    autoindex on;
    charset utf-8;
}

On client machines, create C:\Users\Administrator\AppData\Roaming\pip\pip.ini with the following content:

[global]
timeout = 6000
index-url = http://<em>your_machine_ip</em>/simple
trusted-host = <em>your_machine_ip</em>

Test the setup: pip install pymysql The installation succeeds and the package is fetched from the private server, confirming that the local pip source works.

Conclusion

This procedure enables a CDH cluster to host multiple isolated Python 3 environments, provides a fast, internal Conda channel and a private pip repository, and can be reproduced on both Linux and Windows nodes. It is especially useful for large‑scale, multi‑tenant big‑data platforms that need consistent, version‑controlled scientific‑computing libraries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonNGINXCondaCDHAnacondaBandersnatchPrivate PyPI
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.