Skip Navigation
News & Ideas

Things I Wish I Knew About Docker Before I Started Using It

In this blog post TWG software engineer Nicole Chung explains her impressions about Docker, a tool for developers to pack, ship, and run any application as a self-sufficient container, that runs anywhere from a laptop to any major cloud service.

When I first started learning Docker, I browsed the Docker documentation, read blog posts, followed tutorials, and used other people’s public git repos for reference. All of these resources did not help me very much.

On Stack Overflow, I discovered many other people were asking the same questions I had — questions like:

  1. What is a WORKDIR, where is it, and how does it really work?
  2. Why isn’t my Docker container changing when I change my Dockerfile? (how Docker does caching?)
  3. What is Docker compose, and why would I want to use it?
  4. How do I ssh into a running container?
  5. Where do I put code and data that needs to persist outside of my Docker container? What if I needed to blow away my application but keep my database? (hint: Docker volumes)

What is a WORKDIR, where is it, and how does it really work?

Here is a sample Dockerfile:
FROM node:alpine
WORKDIR /usr/src/app

I’m using node:alpine as my base image (a popular choice for its small file size). Next, I set /usr/src/app as my directory for my WORKDIR.

WORKDIR is your working directory. It only exists in your container. As a standard, Docker has already created the directory /usr/usr would almost always be there in the base image.

If you want to have your working directory in a brand new folder, you need to do something like this:
FROM node:alpine

The other thing you will notice is that WORKDIR often comes right after using FROM in your Dockerfile.

This is because other Docker instructions like RUN, CMD, ENTRYPOINT , COPY or ADD are all run in the directory set by your WORKDIR.

Really, the only RUN command you should be running before your WORKDIR command are installing any dependencies that you’d like to be global inside of your Docker container. Everything else should probably be in your WORKDIR.

Do not put files or data in your WORKDIR that are subject to constant change. By this I mean, don’t put your application code or your database into your WORKDIR. Application code is the sort of thing that you want to put into a VOLUME.

However, this is where things get complicated very quickly.

For example, many cloud services that host your Docker container do not necessarily support VOLUMES at all. In which case you will want to set up a Docker development environment where you use volumes, and a Docker production environment where you do not use volumes.

Also, files and folders that you ADD or COPY to your WORKDIR are not available to your VOLUME unless you expressly copy them over.

Why isn’t my Docker container changing when I change my Dockerfile? (how Docker does caching)

Every instruction (line) in your Dockerfile gets cached. There are some more complicated rules about how and what gets cached, but if you read your output in the command line, you will see something that looks like this:

Step 2 : RUN npm install -g nodemon
 — -> Using cache
 — -> <some hash goes here>
Step 3 : WORKDIR /app
 — -> Using cache
 — -> <some hash goes here>
Step 4 : COPY package.json .
 — -> <some hash goes here>
Removing intermediate container <some hash goes here>
Step 7 : RUN npm install
 — -> Running in <some hash goes here>

The lines you need to pay attention are the ones that say:

— -> Using cache

If it’s Using cache this is good if you haven’t changed anything since the last time you modified your Dockerfile — but it’s not so good if you have changed something (e.g. you added a new dependency in your package.json or changed code in a file you just copied or added via COPY or ADD).

The point of the cache is to speed up build times. This is great for your base image (see FROM), since downloading your base image takes a bit of time and base images do not change often. However, caching is a pain for most everything else while you are developing your Docker container, before you push your resulting Docker image to a repository for reuse.

Once you are sure you won’t be changing your Dockerfile for the next few months, it’s all right to go ahead and cache, but in the meanwhile, you need to pay attention to what is getting cached, otherwise you are going to run into some nasty, time-consuming surprises when you modify your code but nothing changes in your Docker container.

Because of how this caching works, you will often find yourself in a situation where you’ve changed your Dockerfile, your docker-compose.yml file, or your application code, but you still aren’t seeing any updates. To get around this there are some useful commands.

Delete all docker containers (must be run before trying to delete images)

$ docker rm $(docker ps -aq)

Delete all docker images:

$ docker rmi $(docker images -q)

This will destroy all of your docker images and containers, and you will have to rebuild from the beginning, making the next build slow (since it’s not borrowing from the cache). However, it will allow you to see your updates.

A useful flag is —-no-cache when developing your Docker image (container?).

You can also create aliases for common Docker commands:

alias dockerrm='docker rm $(docker ps -aq)'
alias dockerrmi='docker rmi $(docker images -q)'

What is Docker compose, and why would I want to use it?

A Dockerfile can be used in conjunction with docker-compose.yml to produce a Docker image.

Docker compose is handy because:

  1. Docker compose can be used to manage multiple containers at once — which, in the real world is what you do. For example, it is common to have a container for your api, another container for you db, and a third container for your Redis cache.
  2. When using Docker compose, you can start all three containers at once with docker-compose up, and then stop them using docker-compose down.
  3. Docker compose puts many of the options you use on the docker run command of the docker cli into a docker-compose.yml file.

The second reason is, I feel, the most important. It’s not really practical to memorize a bunch of docker cli options that you will either re-type or copy and paste over and over, when you can instead just put those into a docker-compose.yml file.

Also, I have found the docker cli options are subject to a lot of change. If you are reading an old blog post, you might use a docker cli flag that has since been deprecated.

Docker compose is also useful when you want to separate your developmentdocker container from your production docker image (the one that you push to a repository for others to re-use).

By using the -f flag you can tell docker compose to use a different docker-compose.yml file like this:

$ docker-compose -f up

Then in my I can specify a development-specific Dockerfile:

 <name of my service, like 'server'>:
 context: .

Also, once you are using Docker compose, you can build your image with no cache by:

$ docker-compose build — no-cache

How do I ssh into a running container?

When developing your Docker image for others to use, you will occasionally have to ssh into your running Docker container to have a peek at what is going on.

Depending on the base image you are using (see your FROM line in your Dockerfile) you might or might not have bash installed. If you don’t have bash, you can use plain old sh. With shell you will still be able to navigate around, list files in a directory, and view files with cat.

For docker, you can view all running containers with:

$ docker ps

This will give you a list of all containers, with their ids and statuses.

To ssh into a particular container:

$ docker exec -it <container_id> sh

It’s possible that your container has bash installed, in which case you can try:

$ docker exec -it <container_id> /bin/bash

With docker compose, you can ssh into a container with

$ docker-compose exec <container_id> sh

Where do I put code and data that needs to persist outside of my Docker container? What if I needed to blow away my application but keep my database?

Hint: Docker volumes.

Docker is great at making images (“blueprints” of containers, that are hosted on a repository) and containers (instances of images). Anything that gets built into a Docker container cannot change at runtime.

What this means is, Docker containers are not a good place to store your application code or the actual data in your database. It’s something you will discover when you are following your first docker tutorial and notice you have to rebuild your container every single time you change a line of code.

For code and data that needs to change at runtime, it is better to use a feature of Docker called volumes. Think of a volume as a folder you mount onto your Docker container’s WORKDIR.

Volumes exist outside of your Docker image. Volumes are good for things that change, like application code and database data.

The easiest way to set up a Docker volume is by using Docker compose.

In the docker-compose.yml file, a typical volume setup might look like this:

 — '.:/src/app'
 — /src/app/node_modules

A set up like above assumes I have a WORKDIR, /src/app, that I created in my Dockerfile:

WORKDIR /src/app
RUN npm install

Notice the colon (:). This is important. The left side of the colon represents your local folder as a relative path. The right side represents a folder in your Docker container (likely to be your WORKDIR).

Also note that you can have multiple volumes, and if you don’t include a colon (:) it means your are telling Docker to treat that folder in your WORKDIR (in this case, /src/app/node_modules) as part of your volume.

In the case of developers who are using node, and npm as their package manager for node modules, the additional volume sans colon (/src/app/node_modules) is something you will often have in order to see your volume code and node_modules in the same WORKDIR. Volumes are mounted at run-time and hide what was installed via RUN in your WORKDIR during build time.

Setting up node_modules this way also speeds up your build times, as this layer (RUN npm install) gets cached.

For a database like Postgres, often you will have Docker specific instructions. In our case, we don’t even specify a Dockerfile, instead we just have a instance created in our docker-compose.yml:

  image: postgres:9.6
  restart: always
  — ./pgdata:/var/lib/postgresql/data/pgdata
  — ‘5432:5432’
   POSTGRES_PASSWORD: 'password'
   POSTGRES_DB: name_of_your_database
   PGDATA: /var/lib/postgresql/data/pgdata

Then, in our api service, we specify the postgres container:

      context: .
      dockerfile: Dockerfile
    container_name: my_api_module
      - '.:/src/app'
      - /src/app/node_modules
      - '8080:8080'
      - db
      - db
      NODE_ENV: 'development'
      DATABASE_URL: 'postgres://postgres:password@db/name_of_your_database'
In conclusion…

I hope you find these tips helpful when developing your Docker images. If you have any more advice, please include them in the comments and I will try to update this article.