Operations 14 min read

How to Quickly Set Up a Data Science Environment with Docker

This guide explains why Docker simplifies data‑science environment setup, walks through installing Docker, pulling ready‑made images, running a container with Jupyter Notebook, managing files, installing additional packages, and cleaning up, providing step‑by‑step commands for Windows, macOS, and Linux users.

ITPUB
ITPUB
ITPUB
How to Quickly Set Up a Data Science Environment with Docker

Why Use Docker for Data Science

Configuring a data‑science development environment can be painful due to mismatched package versions, long compile times, and obscure error messages. Docker solves these problems by delivering a pre‑configured, isolated Linux container that includes Python, Jupyter Notebook, and popular libraries, making the setup fast, reproducible, and cross‑platform.

Installing Docker

Docker provides graphical installers for Windows and macOS, and package managers for Linux. After installation, use the provided terminal (Docker Quickstart Terminal on macOS) or any shell prompt to run Docker commands.

Pulling a Data‑Science Image

Dataquest offers two ready‑made images on Docker Hub: dataquestio/python3-starter – Python 3, Jupyter, NumPy, pandas, SciPy, scikit‑learn, NLTK, etc. dataquestio/python2-starter – Same stack for Python 2.

Download an image with:

docker pull dataquestio/python3-starter

Creating a Workspace Folder

On the host machine, create a directory to store notebooks so that files persist after the container stops, e.g.:

mkdir -p /home/vik/notebooks

Running the Container

Start the image with port forwarding, detached mode, and a volume mapping to the notebook folder:

docker run -d -p 8888:8888 -v /home/vik/notebooks:/home/ds/notebooks dataquestio/python3-starter

Replace the host path and image name as needed. Docker will print the container ID, which you will use for later commands.

Accessing Jupyter Notebook

On Linux, open http://localhost:8888 in a browser. On Windows or macOS with Docker Machine, obtain the VM IP with docker-machine ip default (replace default with your machine name) and browse to IP_ADDRESS:8888.

Running a Sample Notebook

Create a new notebook and try the following scikit‑learn example (the code is shown exactly as it should be typed):

from sklearn import datasets, linear_model, cross_validation
import matplotlib.pyplot as plt

lr = linear_model.LinearRegression()
boston = datasets.load_boston()
y = boston.target
predicted = cross_val_predict(lr, boston.data, y, cv=10)
fig, ax = plt.subplots()
ax.scatter(y, predicted)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()

Adding Data Files

You can place data files in the host notebook folder, use docker cp to copy files into/out of the container, or upload via the Jupyter UI. To copy a file into the container:

docker cp /home/vik/data.csv 4greg24134:/home/ds/notebooks

To copy a file out:

docker cp 4greg24134:/home/ds/notebooks/data.csv /home/vik/

Installing Additional Packages

Enter the running container with: docker exec -it 4greg24134 /bin/bash Then install any pip package, for example: pip install requests Exit the shell with exit.

Stopping and Removing the Container

When you are finished, stop and remove the container using its ID: docker rm -f CONTAINER_ID You can list running containers with docker ps to retrieve the ID.

Customising the Image

If you need a different set of libraries, modify the Dockerfile in the GitHub repository that builds the image, then build and push your own image.

Source: Programming Club (编程派)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerPythonDevOpsContainerJupyterdata-science
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.