How to Deploy Multi‑Version Python with Anaconda on a CDH Cluster and Build a Private PyPI Repository
Learn step‑by‑step how to install Anaconda parcels on a CDH cluster to create isolated Python 3 environments, set up a local Conda package repository with Nginx, use wget and bandersnatch to mirror packages, and configure private pip sources for Windows and Linux nodes.
Background
Python has become a dominant language for data science and AI. In a multi‑tenant CDH (Cloudera Distribution Hadoop) environment, different tenants often require different Python versions and scientific‑computing libraries. This guide shows how to deploy a Python 3 environment using an Anaconda parcel, create a private Conda repository, and expose it via a local PyPI server.
Prerequisites
At least 300 GB of storage on the host machine. wget for downloading packages. Nginx to serve the local repository.
1. Download the Anaconda Parcel
The Anaconda parcel can be obtained from the official archive:
https://repo.continuum.io/pkgs/misc/parcels/archive/2. Extract the Parcel and Create a Private Python 3 Environment
Unpack the downloaded .parcel file and use the bundled installer to create a dedicated Python 3 environment. After extraction the directory structure looks like the screenshots below.
3. Install Private Conda Packages
Use the Tsinghua University mirror to download the required packages (the mirror may not have the newest versions but offers high speed):
https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/Because the internal machines cannot access the Internet, download the packages on a Windows host using wget and later copy them to the cluster.
Install wget on Windows (via Chocolatey)
C:\WINDOWS\system32> "%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"
C:\WINDOWS\system32> choco install wgetUpgrade later with:
C:\WINDOWS\system32> choco upgrade wgetDownload Packages with wget
Create a text file ( pyku.txt) that lists all required package URLs, then run:
wget -b -c -i C:\conf\pyku.txt -p C:\conf\ku -f C:\conf\wget-log4. Set Up a Local PyPI Server with Nginx
Install Nginx on Linux and configure it to serve the directory that holds the mirrored packages.
# Example nginx.conf snippets
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html/;
index index.html index.htm;
}
location /pkgs/free {
alias /usr/share/nginx/html/pkgs/free/;
autoindex on;
autoindex_exact_size on;
autoindex_localtime on;
}
location /pkgs/free/linux-64 {
alias /opt/beh/core/condaku/freeku/;
autoindex on;
autoindex_exact_size on;
autoindex_localtime on;
}After reloading Nginx, the repository can be accessed via URLs such as http://12.109.21.84/pkgs/free/.
5. Configure Conda to Use the Private Repository
Edit ~/.condarc (or /etc/conda/condarc) to add the local channels and disable the default channel:
conda config --add channels http://12.109.21.84/pkgs/free/
conda config --add channels http://12.109.21.84/pkgs/free/noarch/
conda config --set show_channel_urls yes
# Remove the line "- defaults" from the file and then:
source ~/.condarc6. Install Packages via Conda
Install any required package with: conda install <em>package_name</em> If a package is not available in the private Conda channel, download the .whl file and place it under the pkgs directory, then install with pip:
pip install /path/to/package.whl7. Package the Anaconda Parcel for Distribution
Create a tarball and generate a SHA‑1 checksum:
tar -zcvf Anaconda-5.0.1-el7.parcel Anaconda-5.0.1 --owner=root --group=root
sha1sum Anaconda-5.0.1-el7.parcel | cut -d ' ' -f 1 > Anaconda-5.0.1-el7.parcel.shaCopy both the .parcel file and the .sha file to the Parcel‑Repo directory and let Cloudera Manager distribute them.
8. Build a Private pip Repository on Windows (Optional)
Install Anaconda on a Windows host, then use bandersnatch to mirror the public PyPI. pip install bandersnatch Create C:\etc\bandersnatch.conf (Windows has no /etc directory, so create an etc folder on the C: drive):
[mirror]
# Directory where the mirror data will be stored.
directory = /srv/pypi
# PyPI server to be mirrored.
master = https://pypi.python.orgRun the mirroring command: bandersnatch -c C:\etc\bandersnatch.conf mirror Configure Nginx (Windows build) to serve the mirrored packages:
server {
listen *:80;
server_name localhost;
root F:\python_package\web;
autoindex on;
charset utf-8;
}On client machines, create C:\Users\Administrator\AppData\Roaming\pip\pip.ini with the following content:
[global]
timeout = 6000
index-url = http://<em>your_machine_ip</em>/simple
trusted-host = <em>your_machine_ip</em>Test the setup: pip install pymysql The installation succeeds and the package is fetched from the private server, confirming that the local pip source works.
Conclusion
This procedure enables a CDH cluster to host multiple isolated Python 3 environments, provides a fast, internal Conda channel and a private pip repository, and can be reproduced on both Linux and Windows nodes. It is especially useful for large‑scale, multi‑tenant big‑data platforms that need consistent, version‑controlled scientific‑computing libraries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
