Intro to Docker


Docker is an open platform for developing, shipping, and running applications (or deploying predictive/prescriptive models).

Figure. Docker logo (source: www.docker.com)

Figure. Docker logo (source: www.docker.com)

Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.(source)

Figure. Docker containers (source: Docker)

Figure. Docker containers (source: Docker)

Docker objects

When you use Docker, you are creating and using images, containers, networks, volumes, plugins, and other objects. This section is a brief overview of some of those objects.

Images

An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, you may build an image which is based on the Ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

You might create your own images or you might only use those created by others and published in a registry. To build your own image, you create a Dockerfile with a simple syntax for defining the steps needed to create the image and run it. Each instruction in a Dockerfile creates a layer in the image. When you change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so lightweight, small, and fast, when compared to other virtualization technologies.

Containers

A container is a runnable instance of an image. You can create, start, stop, move, or delete a container using the Docker API or CLI. You can connect a container to one or more networks, attach storage to it, or even create a new image based on its current state.

By default, a container is relatively well isolated from other containers and its host machine. You can control how isolated a container’s network, storage, or other underlying subsystems are from other containers or from the host machine.

A container is defined by its image as well as any configuration options you provide to it when you create or start it. When a container is removed, any changes to its state that are not stored in persistent storage disappear.

For more details about docker, please refer to https://docs.docker.com/engine/docker-overview/.

Example - Deploying a RStudio instance in a docker container

All the cool data science kids seem to be using Docker these days, and being able to instantly spin up a pre-built computer with a complete development or production environment is magic. Super basic practical guide to Docker and RStudio

rOpenSci is a non-profit initiative founded to make scientific data retrieval reproducible. Within their great number of projects, they maintain pre-built Docker images to easy use R within containers. Three of their most know images are:

In order to deploy a RStudio instance in your own container, please follow the steps below. It is assume that the reader has already installed docker:

  1. Create a Docker file with the rocker/tidyverse image as base plus install the tidytext and countrycode packages. Use the following code chunk.
  FROM rocker/tidyverse

  # Install other libraries
  RUN install2.r --error \
         tidytext countrycode 
  1. Build your customized image
    docker build -t rstudio:latest .
  2. Explore your images
    docker images
  3. Mount the rstudio:latest image on a container.
    docker run -P rstudio:latest This will mount the image and you will required to open other cli or PuTTY instance.
  4. On the new cli instance, take note of the ‘IP address for eth1’
    e.g. IP address for eth1: 10.0.32.8
  5. Get the docker container <port>
    docker ps
  6. Open a browser and go to <ip address><port>
  • user: rstudio
  • pass: rstudio
  1. Alternatively, you can execute bash in the docker container.
    docker exec -ti <doker id> bash

This example show how to load an image in a container and make it available in the local host. In the next section, this document shows how to use kubernetes to orchestrate containers.

Kubernetes (extra)

Kubernetes is a production-ready, open source platform designed with Google’s accumulated experience in container orchestration, combined with best-of-breed ideas from the community.

Kubernetes resources

  • Kubernetes concepts
  • Interactive tutorial: There is a great interactive tutorial on how to create a cluster, and deploy, explore, expose, scale and update an application. For our interests, it is recommended to do Lessons 1 to 3.
  • kubectl: kubectl is a command line interface for running commands against Kubernetes clusters.
  • Ingress: Ingress is used to make a service contained in a pod public, i.e. outside kubernetes cluster.