Out of the Box Jupyter with Deep Learning Libraries

May 3

Data Machines (DMC) has just released a new edition of our TensorFlow and OpenCV docker container with Jupyter Notebooks enabled, titled the “datamachines/jupyter_to” container, as well as a GPU counterpart, the “datamachines/jupyter_cto” container.

The build for this docker container has been especially tailored to support data scientists, machine learning engineers, and computer vision enthusiasts. It provides access to Python packages, including TensorFlow, OpenCV, Keras, NumPy, pandas, and PyTorch, all accessible from within a Jupyter Notebook.

DMC has provided the docker image in Docker Hub, so obtaining the container is simple. Once the container is running, you can access Jupyter in a browser right from your local host. This way, you are launching Jupyter in a new, clean environment, and can get right to work; no need to install packages or check dependencies unless you want to add specific capabilities!

Jupyter

Jupyter is a free and open-source “web-based notebook environment for interactive computing” to support “workflows in data science, scientific computing, computational journalism, and machine learning.” Jupyter Notebooks specifically allow users to develop code and detailed documentation, and view rich outputs.

Jupyter is straightforward to install on a host system once python is installed:

pip install notebook

This simplicity of installation and use are part of what makes it so popular for data scientists to quickly prototype reproducible solutions to problems. Because they support Python packages installed on the host, they provide the user with a quick means to work on their problem set.

To learn more, please visit GitHub.

Docker

The installation of multiple, large packages can often be quite complex. Do you install them in a virtual environment or directly on the host? Are they CPU-bound or capable of leveraging GPU? Are any additional system packages required?

This is where containers come into play. Containers are key for their ability to recreate consistent environments, making them a must-have tool for engineering teams. Containers enable users to ensure the right package versions and dependencies are available to the container’s user no matter the machine they’re working on, making it easy to install packages and reproduce projects.

Docker provides a simple framework for building, running, and sharing. DMC shares some of our docker containers’ images via Docker Hub. Then, others can simply pull the pre-built containers themselves based on the image, and run it on their own machine.

Background

In early 2019, DMC started releasing to the public its “DockerFile with GPU support for TensorFlow and OpenCV.” The goal was to provide data scientists with a means to quickly prototype without the hassle of maintaining a base container. And if they wanted to, they could simply expand upon it using a Docker “FROM”.

It was designed to install a GPU optimized version of TensorFlow and OpenCV. Along with those tools, it would install Protocol Buffers, Jupyter, Keras, Numpy, Pandas and X11 support. Even then Jupyter was part of its core capabilities, but accessing it required additional steps.

At the same time in 2019, DMC started to release containers built from this Dockerfile. In June of 2019, DMC posted a primer to enable end users to quickly get started with Docker, GPU and our container, in a blog post “Toward a Containerized Nvidia CUDA, TensorFlow and OpenCV.” Although the steps listed there are now less relevant since its release in 2019 to setup and use Docker containers with GPUs on Linux, the container that DMC provided has kept evolving to bring more and more of the frameworks that computer vision enthusiasts, data scientists, and machine learning professionals use.

Many popular libraries support have been added; for example PyTorch, Torch Audio and Torch Vision. The list of installed packages is available for each release in its “BuildInfo” directories. Similarly, edge device (Nvidia Jetson) versions have been made available.

Jupyter has been part of the container since its early days and its usage documented in the project. Because of Jupyter’s usefulness we’ve recently decided to release a standalone version of the base container.

How to Use It

To find the current CPU version of the Jupyter_to container, visit our Docker Hub registry.

To run this CPU version, pull down the image from the Docker Hub registry using the current tag:

docker pull datamachines/jupyter_to:2.8.0_4.5.5-20220318

Run it using a docker run command, exposing the container’s port 8888 (used by the Jupyter interface):

docker run --rm -p 8888:8888 datamachines/jupyter_to:2.8.0_4.5.5-20220318

If you want your work to save locally, you can mount your local directory to the container. The default directory of the container is /dmc, so the container will automatically start in this directory. If nothing is mounted to it, you will be working directly inside the container. If you want your work to persist, even after the container has stopped running, mount a host directory (where you have your data/code locally) as /dmc. For example, you can mount your current working directory by adding -v `pwd`:/dmc.

The whole command would look like:

docker run -v `pwd`:/dmc --rm -p 8888:8888 datamachines/jupyter_to:2.8.0_4.5.5-20220318

That should run the container, and you will see a URL result in your terminal, which you can use to access Jupyter.

Paste the http://127.0.0.1:8888/token=... URL in your browser to bring up your new Jupyter instance. Now, you can use Jupyter to run code using the terminal, manage files and create notebooks!

We hope you enjoyed this intro to our docker container for Jupyter, TensorFlow, and OpenCV. Keep an eye out for our next blog post which will dive into a real example using the Jupyter_to container!

Matt Quinn

Out of the Box Jupyter with Deep Learning Libraries

DMC Presents at the UNCW and Defense Alliance Science and Technology Forum

DMC Advances New IEEE Standard for Federated Cloud Computing