Understanding Docker and Kubernetes

In this post, we hope to concisely introduce basic concepts in Docker and Kubernetes. We hope to get new learners interested in the exciting world of containerization and distributed systems.

Docker Basics

Docker is a platform for developing, shipping, and running applications using containers. Some key Docker concepts:

  • Dockerfile - A text file that contains instructions for building a Docker image. Example:
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

This Dockerfile specifies the base image, working directory, copies files, runs commands to install dependencies, exposes a port, and specifies the startup command.

  • Docker Image - A read-only template with instructions for creating a Docker container. Images are built from a Dockerfile and stored in a registry like Docker Hub. Example command to build an image:
docker build -t my-app:v1 .
  • Docker Container - A runnable instance of an image. You can create, start, stop and delete containers. Example to run a container:
docker run -p 3000:3000 my-app:v1
  • Docker Compose - A tool for defining and running multi-container applications. It uses a YAML file to configure the application’s services. Example docker-compose.yml:
version: '3'
services:
  web:
    build: .
    ports:
      - "3000:3000"
  db:
    image: mongo
    volumes:
      - db-data:/data/db
volumes:
  db-data:

This defines a web service built from the Dockerfile in the current directory and a MongoDB service using the mongo image. It also creates a persistent volume for the database [1] [3].

Kubernetes Concepts

Kubernetes is a system for automating deployment, scaling, and management of containerized applications. Some key concepts:

  • Pod - The basic execution unit of a Kubernetes application–the smallest and simplest unit in the Kubernetes object model that you create or deploy. A Pod encapsulates an application’s container (or multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run [6].

  • Deployment - Provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state. Example deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This creates a deployment named nginx-deployment with 2 replica pods running the nginx container [6].

  • Service - An abstract way to expose an application running on a set of Pods as a network service. Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods, and can load-balance across them [4].

  • Ingress - An API object that manages external access to the services in a cluster, typically HTTP. Ingress may provide load balancing, SSL termination and name-based virtual hosting [4].

  • Horizontal Pod Autoscaler (HPA) - Automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization or custom metrics [2].

  • Cluster Autoscaler - A tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true: there are pods that failed to run in the cluster due to insufficient resources or there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes [5].

Kubernetes Networking

Kubernetes networking model:

  • Pods on a node can communicate with all pods on all nodes without NAT
  • Agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
  • Pods in the host network of a node can communicate with all pods on all nodes without NAT [7]

Kubernetes implements this using:

  • Pod Network - Each pod gets its own IP address, pods on a node can communicate with all pods on all nodes
  • Service - Provides a single stable name and address for a set of pods, acts as a load balancer
  • Ingress - Allows inbound connections to the cluster, typically HTTP, provides load balancing, SSL, name-based virtual hosting
  • Network Policies - Firewall rules specifying how pods are allowed to communicate [7]

Example: In a cluster, pod 1.1.1.1 can directly communicate with pod 2.2.2.2 on the same or different node without any NAT in between. Services provide stable endpoints for pods [4].

Containers are food, Images are recipes

To internalize the distinction between images and containers, it is helpful to build an analogy. In what follows, we think about this in terms of cooking food.

A Docker image is like a recipe [6]. It contains all the instructions (the code), ingredients (the dependencies, libraries, tools), and steps needed to prepare a dish (the application) [6]. The recipe specifies exactly what goes into the dish, just like how an image specifies what goes into a container.

A Docker container is like the actual dish that you cook using the recipe [6]. You follow the instructions in the recipe, combine the ingredients, and create an instance of the dish. Similarly, you use the instructions in the image to create a running instance of the application, which is the container.

Some key points:

  • You can create multiple dishes from the same recipe, just like you can create multiple container instances from a single image [11].
  • If you change the recipe, it doesn’t affect dishes you’ve already cooked, similar to how updating an image doesn’t impact already running containers [11].
  • Recipes are a blueprint to create dishes, while images are a blueprint to create containers [12].
  • The recipe itself is not the food you eat, just like the image itself is not the running software. The cooked dish is what you actually consume, like how the container is what actually runs your application [9].

So in summary, Docker images are the instructions (like recipes) that define what goes into your application, while containers are the actual running instances of that application (like cooked dishes), created from those instructions [8] [9] [10] [11] [12] [13].

Are docker images and containers memory intensive?

Actually, building Docker images and running containers is designed to be quite efficient in terms of memory usage. Here’s why:

  1. Layer Caching:
    • Docker images are built using a layered filesystem. Each instruction in the Dockerfile creates a new layer on top of the previous one.
    • If you build an image multiple times without changing the instructions, Docker will reuse the cached layers from previous builds, saving time and resources.
    • This means that after the first build, subsequent builds will be much faster and consume less memory, as Docker only needs to build the layers that have changed.
  2. Layer Sharing:
    • When you have multiple images that share the same base layers (e.g., multiple images based on Ubuntu), Docker will only store one copy of the shared layers on disk.
    • This means that even if you have hundreds of containers running from different images, if they share the same base layers, Docker will only need to load those layers into memory once.
    • This efficient sharing of layers helps reduce memory usage, as the common parts of images are not duplicated in memory.
  3. Container Isolation:
    • While each container runs in its own isolated environment, containers are not full-fledged virtual machines. They share the host system’s kernel and resources.
    • Containers are lightweight and only include the necessary libraries, dependencies, and runtime for the application, rather than a full operating system.
    • This means that containers have a smaller memory footprint compared to running the same application directly on the host or in a virtual machine.
  4. Memory Constraints:
    • Docker allows you to set memory constraints on containers using the --memory flag when running a container.
    • You can specify the maximum amount of memory a container can use, preventing it from consuming excessive memory resources.
    • This helps ensure that containers operate within the defined memory limits and don’t overwhelm the host system.

Of course, the actual memory usage will depend on the specific requirements of your application and the dependencies you include in your Docker image. However, Docker’s architecture and features are designed to optimize memory usage and make efficient use of system resources.

It’s also worth noting that you can use smaller base images, such as Alpine Linux, which is known for its minimal size and low memory footprint, to further reduce the memory consumption of your containers.

So, while building Docker images does require some memory, it’s not as memory-intensive as it might seem, thanks to layer caching, sharing, and the lightweight nature of containers.

Who gives a docker container memory and compute access? Does container isolation eliminate malware risks in installing frameworks locally?

The Docker daemon (also known as the Docker engine) is responsible for allocating memory and compute resources to containers. When you start a container, the Docker daemon pulls the necessary image layers, creates a writable container layer, and allocates the specified system resources such as CPU, memory, and network access to the container.

Regarding the point about containers eliminating malware risks, there is some truth to it, but it’s not an absolute guarantee. Here’s why:

  1. Isolation: Containers provide a level of isolation from the host system. Each container runs in its own isolated environment with its own filesystem, processes, and network interfaces. This isolation helps prevent malware or other malicious processes within a container from directly affecting the host system or other containers.

  2. Dependency Encapsulation: Containers encapsulate an application along with its dependencies and libraries. This means that even if a particular framework or library used in the container has vulnerabilities, those vulnerabilities are confined within the container environment and do not directly impact the host system.

  3. Read-only Filesystems: Docker images can be created with read-only filesystems, which prevents malware from modifying the container’s filesystem. This reduces the risk of persistent malware infections within containers.

However, it’s important to note that containers are not a silver bullet for eliminating all malware risks:

  • Kernel Sharing: Containers share the host system’s kernel. If there is a vulnerability in the kernel that allows container escape or privilege escalation, malware in a container could potentially exploit it to gain access to the host system.

  • Image Security: The security of a container depends on the security of the image it’s built from. If an image contains malware or vulnerabilities, the containers created from that image will inherit those issues.

  • Network Risks: Containers connected to a network can still be vulnerable to network-based attacks. Malware in a compromised container could potentially spread to other containers or systems on the same network.

So while containers provide an additional layer of isolation and can help mitigate some malware risks associated with installing frameworks or libraries directly on the host system, they are not a complete security solution on their own. It’s still crucial to follow security best practices, such as using trusted images, keeping containers updated, and properly configuring container networking and permissions.

Is docker meant to be a replacement to git? I can see how immutability of docker images helps with versioning. But what if multiple developers want to work on updates for a particular version? What’s the equivalent of a git merge or a main branch?

No, Docker is not meant to be a replacement for Git. Docker and Git serve different purposes and are often used together in a development workflow. Let’s clarify the roles of Docker and Git and how they can be used in collaboration:

  1. Git:
    • Git is a distributed version control system (VCS) used for tracking changes in source code and enabling collaboration among developers.
    • It allows multiple developers to work on the same codebase, create branches, make changes, and merge those changes back into the main branch.
    • Git focuses on managing and versioning the source code and other text-based files.
    • It provides features like branching, merging, pull requests, and code reviews to facilitate collaboration and maintain a clear history of changes.
  2. Docker:
    • Docker is a platform for containerizing applications and managing their dependencies and runtime environment.
    • It allows you to package an application, along with its dependencies and configurations, into a portable container image.
    • Docker focuses on providing a consistent and reproducible environment for running applications across different systems.
    • It enables easy deployment, scaling, and management of containerized applications.

Collaboration with Docker and Git:

  • In a typical development workflow, Git is used for version control and collaboration on the source code, while Docker is used for packaging and deploying the application.
  • Developers use Git to manage the source code, create branches, make changes, and collaborate with others through pull requests and code reviews.
  • The Dockerfile, which defines the instructions for building a Docker image, is typically versioned and managed alongside the source code in the Git repository.
  • When changes are made to the source code and the Dockerfile, developers can build new Docker images that incorporate those changes.
  • The built Docker images can be tagged with version numbers or specific tags to represent different versions or releases of the application.

Updating and Merging with Docker:

  • If multiple developers are working on updates for a particular version of an application, they would typically collaborate using Git branches and pull requests.
  • Each developer can create a separate branch in the Git repository to work on their specific changes to the source code and the Dockerfile.
  • When a developer finishes their changes, they can create a pull request to propose merging their branch into the main branch.
  • The pull request can be reviewed, discussed, and merged by the team, following the standard Git workflow.
  • Once the changes are merged into the main branch, a new Docker image can be built from the updated source code and Dockerfile.
  • The new Docker image can be tagged with a version number or a specific tag to represent the updated version of the application.

Continuous Integration and Deployment (CI/CD):

  • Docker images can be integrated into a CI/CD pipeline to automate the build, testing, and deployment processes.
  • Whenever changes are pushed to the Git repository, the CI/CD pipeline can automatically trigger a build of the Docker image, run tests, and deploy the updated application to the target environment.
  • This ensures that the latest changes are continuously integrated and deployed, maintaining a consistent and up-to-date version of the application.

In summary, Docker is not a replacement for Git, but rather a complementary tool used alongside Git in a development workflow. Git is used for version control and collaboration on the source code, while Docker is used for packaging and deploying the application. Developers can collaborate using Git branches and pull requests, and the resulting changes can be incorporated into new Docker images for deployment.

How do docker registries work? How are they maintained separate from git repositories?

When you build a Docker image and tag it with a version number or a specific tag, you can push that image to a Docker registry, which serves as a repository for storing and distributing Docker images. This allows you to share the image with others and make it accessible for deployment.

Here’s how the process typically works:

  1. Build the Docker Image:
    • After making changes to the source code and the Dockerfile, you build a new Docker image using the docker build command.
    • During the build process, you can specify a tag for the image using the -t flag, for example:
      docker build -t myapp:v1.0 .
      
    • This command builds the Docker image and tags it with the name “myapp” and the version “v1.0”.
  2. Push the Image to a Docker Registry:
    • Once the image is built and tagged, you can push it to a Docker registry using the docker push command.
    • Before pushing, you need to ensure that you are logged in to the registry using docker login (if required by the registry).
    • For example, to push the image to Docker Hub, you would run:
      docker push myapp:v1.0
      
    • This command pushes the image with the tag “v1.0” to the Docker Hub registry.
    • If you’re using a private registry or a different registry provider, you would need to specify the appropriate registry URL and credentials.
  3. Versioning and Tagging:
    • You can push multiple versions or variations of the same image to the registry by using different tags.
    • For example, you can have tags like “v1.0”, “v1.1”, “latest”, or any other meaningful tag that represents a specific version or state of the image.
    • Tagging allows you to maintain different versions of the image in the registry and enables users to pull specific versions as needed.
  4. Pulling the Image:
    • Other users or deployment systems can pull the image from the registry using the docker pull command.
    • They can specify the image name and tag to pull the desired version of the image.
    • For example:
      docker pull myapp:v1.0
      
    • This command pulls the image with the tag “v1.0” from the registry.

By pushing the tagged Docker images to a registry, you make them accessible to others and enable easy distribution and deployment of your application. The registry acts as a central repository for storing and managing Docker images.

It’s important to note that the Docker registry is separate from your Git repository. The Git repository contains the source code and the Dockerfile used to build the Docker image, while the Docker registry stores the built images themselves.

You can use various Docker registry providers, such as Docker Hub, AWS Elastic Container Registry (ECR), Google Container Registry (GCR), or even set up your own private registry. These registries provide features like access control, image versioning, and collaboration among team members.

Integrating the process of building, tagging, and pushing Docker images into your CI/CD pipeline can automate the versioning and deployment of your application, ensuring that the latest version of the image is always available in the registry for deployment.

Why registry? Why not just include the image in the git repo?

While it’s technically possible to include Docker images in a Git repository, it’s generally not recommended or considered a best practice. There are several reasons why using a Docker registry is preferred over storing images in a Git repository:

  1. Repository Size and Performance:
    • Docker images can be quite large in size, often ranging from hundreds of megabytes to several gigabytes.
    • Including large binary files like Docker images in a Git repository can significantly increase the repository size and make cloning, fetching, and pushing the repository slower and more resource-intensive.
    • Git is designed to efficiently handle text-based files and source code, not large binary files.
  2. Version Control System (VCS) Limitations:
    • Git and other version control systems are optimized for tracking changes in text-based files, such as source code.
    • They are not well-suited for managing and versioning large binary files like Docker images.
    • Git’s delta compression and storage mechanisms are less efficient for binary files, leading to repository bloat and performance issues.
  3. Separation of Concerns:
    • Storing Docker images in a separate registry allows for a clear separation between the source code and the built artifacts (images).
    • The Git repository should focus on managing the source code, Dockerfile, and related configuration files.
    • The Docker registry, on the other hand, is dedicated to storing and distributing the built Docker images.
    • This separation promotes a cleaner and more maintainable development workflow.
  4. Access Control and Distribution:
    • Docker registries provide features for access control, allowing you to manage who can push, pull, and access the images.
    • You can set up private registries or use cloud-based registry services to control access to your images.
    • Registries also offer efficient distribution mechanisms, such as content delivery networks (CDNs), to optimize image retrieval and reduce network latency.
  5. Integration with Docker Ecosystem:
    • Docker registries are an integral part of the Docker ecosystem and are well-supported by various tools and platforms.
    • Docker CLI commands, such as docker push and docker pull, are designed to work seamlessly with registries.
    • Container orchestration platforms like Kubernetes and Docker Swarm can easily pull images from registries during deployment.
  6. Collaboration and Sharing:
    • Docker registries facilitate collaboration and sharing of images among team members and across different projects.
    • By pushing images to a registry, you can easily share them with others without the need to transfer large files manually.
    • Registries also enable versioning and tagging of images, making it easier to manage and distribute different versions of your application.

While it’s possible to store Docker images in a Git repository using tools like Git LFS (Large File Storage), it’s not a common or recommended practice. The benefits of using a dedicated Docker registry outweigh the drawbacks of storing images in a Git repository.

It’s important to note that the Dockerfile and other configuration files related to building the Docker image should still be versioned and managed in the Git repository. The Git repository should contain the necessary files to build the image, while the built images themselves are stored and distributed through a Docker registry.

Understanding Local Development Containers

Local Development Containers allow you to create an isolated development environment on your local machine using Docker containers. This approach enables you to develop your application within a container that encapsulates all the necessary dependencies, libraries, and frameworks, without interfering with your host system’s environment.

Here’s how you can set up and use Local Development Containers:

  1. Dockerfile:
    • Create a Dockerfile that defines the base image, dependencies, and tools required for your development environment.
    • Specify the necessary libraries, frameworks, and versions that your application depends on.
    • Include any additional development tools or utilities that you need.

    Example Dockerfile:

    FROM python:3.9
       
    # Install dependencies
    COPY requirements.txt .
    RUN pip install -r requirements.txt
       
    # Set the working directory
    WORKDIR /app
       
    # Copy the application code
    COPY . .
       
    # Specify the command to run your application
    CMD "python", "app.py"]
    
  2. Docker Compose (Optional):
    • If your application consists of multiple services or requires additional configurations, you can use Docker Compose to define and manage the development environment.
    • Create a docker-compose.yml file that specifies the services, volumes, networks, and other configurations needed for your development setup.

    Example docker-compose.yml:

    version: '3'
    services:
      app:
        build: .
        volumes:
          - ./:/app
        ports:
          - "8000:8000"
        depends_on:
          - db
      db:
        image: postgres
        environment:
          - POSTGRES_PASSWORD=secret
    
  3. Build and Run the Development Container:
    • Open a terminal and navigate to the directory containing your Dockerfile (and docker-compose.yml if using Docker Compose).
    • Build the Docker image using the docker build command:
      docker build -t myapp-dev .
      
    • Run the development container using the docker run command:
      docker run -it --rm -v $(pwd):/app -p 8000:8000 myapp-dev
      

      This command starts the container, mounts the current directory ($(pwd)) to the /app directory inside the container, and maps port 8000 from the container to port 8000 on the host.

  4. Develop Inside the Container:
    • With the development container running, you can now develop your application inside the isolated environment.
    • Any changes you make to the files in the mounted directory (./) will be reflected inside the container.
    • You can use your preferred IDE or text editor on your host machine to modify the code, while the container provides the runtime environment.
  5. Run and Test the Application:
    • Inside the running container, you can run your application and perform testing.
    • The container will have access to the specified libraries and frameworks, isolated from your host system.
    • You can execute commands, run tests, and interact with your application within the container’s environment.

By using Local Development Containers, you can ensure that your development environment is consistent across different machines and team members. It provides isolation, reproducibility, and the ability to use specific versions of libraries and frameworks without affecting your host system.

The Dockerfile and Docker Compose configuration can be customized to include only the dependencies and tools necessary for your project, giving you control over the development environment.

In summary, Local Development Containers offer a convenient way to develop applications while leveraging the benefits of containerization, such as portability, isolation, and reproducibility.

We hope this overview helps a new learner get started with Docker and Kubernetes. Feel free to share your thoughts in comments.


References

[1] github.com: Awesome Docker Compose Samples
[2] kubecost.com: The Guide To Kubernetes HPA by Example
[3] purestorage.com: Docker Compose vs. Dockerfile with Code Examples
[4] tigera.io: Kubernetes Networking Guide
[5] kubecost.com: Kubernetes Cluster Autoscaler Guide
[6] spacelift.io: Kubernetes Deployment YAML Tutorial
[7] suse.com: Introduction to Kubernetes Networking
[8] bernhardwenzel.com: The Mental Model Of Docker Container Shipping
[9] educative.io: Intro to Docker - Containers, Images, and the Dockerfile
[10] circleci.com: Docker Image vs Container: What are the Differences?
[11] mend.io: Docker Images vs Docker Containers: Key Differences
[12] aws.amazon.com: The Difference Between Docker Images and Containers
[13] linkedin.com: Docker Analogy by Prabir Kumar Mahatha

Assisted by perplexity.ai

Written on March 28, 2024