How to Quickly Set Up a Data Science Environment with Docker
This guide explains why Docker simplifies data‑science environment setup, walks through installing Docker, pulling ready‑made images, running a container with Jupyter Notebook, managing files, installing additional packages, and cleaning up, providing step‑by‑step commands for Windows, macOS, and Linux users.
Why Use Docker for Data Science
Configuring a data‑science development environment can be painful due to mismatched package versions, long compile times, and obscure error messages. Docker solves these problems by delivering a pre‑configured, isolated Linux container that includes Python, Jupyter Notebook, and popular libraries, making the setup fast, reproducible, and cross‑platform.
Installing Docker
Docker provides graphical installers for Windows and macOS, and package managers for Linux. After installation, use the provided terminal (Docker Quickstart Terminal on macOS) or any shell prompt to run Docker commands.
Pulling a Data‑Science Image
Dataquest offers two ready‑made images on Docker Hub: dataquestio/python3-starter – Python 3, Jupyter, NumPy, pandas, SciPy, scikit‑learn, NLTK, etc. dataquestio/python2-starter – Same stack for Python 2.
Download an image with:
docker pull dataquestio/python3-starterCreating a Workspace Folder
On the host machine, create a directory to store notebooks so that files persist after the container stops, e.g.:
mkdir -p /home/vik/notebooksRunning the Container
Start the image with port forwarding, detached mode, and a volume mapping to the notebook folder:
docker run -d -p 8888:8888 -v /home/vik/notebooks:/home/ds/notebooks dataquestio/python3-starterReplace the host path and image name as needed. Docker will print the container ID, which you will use for later commands.
Accessing Jupyter Notebook
On Linux, open http://localhost:8888 in a browser. On Windows or macOS with Docker Machine, obtain the VM IP with docker-machine ip default (replace default with your machine name) and browse to IP_ADDRESS:8888.
Running a Sample Notebook
Create a new notebook and try the following scikit‑learn example (the code is shown exactly as it should be typed):
from sklearn import datasets, linear_model, cross_validation
import matplotlib.pyplot as plt
lr = linear_model.LinearRegression()
boston = datasets.load_boston()
y = boston.target
predicted = cross_val_predict(lr, boston.data, y, cv=10)
fig, ax = plt.subplots()
ax.scatter(y, predicted)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()Adding Data Files
You can place data files in the host notebook folder, use docker cp to copy files into/out of the container, or upload via the Jupyter UI. To copy a file into the container:
docker cp /home/vik/data.csv 4greg24134:/home/ds/notebooksTo copy a file out:
docker cp 4greg24134:/home/ds/notebooks/data.csv /home/vik/Installing Additional Packages
Enter the running container with: docker exec -it 4greg24134 /bin/bash Then install any pip package, for example: pip install requests Exit the shell with exit.
Stopping and Removing the Container
When you are finished, stop and remove the container using its ID: docker rm -f CONTAINER_ID You can list running containers with docker ps to retrieve the ID.
Customising the Image
If you need a different set of libraries, modify the Dockerfile in the GitHub repository that builds the image, then build and push your own image.
Source: Programming Club (编程派)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
