Deploying to DigitalOcean k8s from Gitlab.com

We're fortunate that there's so much technology off of the shelf for us to use. The problem is, that not a lot of it plays nicely with each other. One such example is using Gitlab.com and DigitalOcean Kubernetes (k8s) together.

A modern Kubernetes based workflow has a number of different pieces:

A Continuous Integration (CI) system has to respond to a git push.
A container needs to be built from the repository source code.
The resulting image of that container build is then sent to a Container Registry for storage.
The CI system then needs to deploy object definitions to Kubernetes.

At first blush, it seems like all the pieces are there between Gitlab.com's free tier, and DigitalOcean's k8s hosting:

Gitlab CI provides the continuous integration.
Gitlab Container Registry provides does the container image hosting.
DigitalOcean provides all the k8s we need.

The problem is, none of it works together. And trying to sort out how to make it work with each other is an exercise in head-scratching even if you're experienced with both Gitlab and Kubernetes.

Container Contruction Inception

The first problem is trying to build the container in the first place. When working locally with a Docker installation, we can run docker build /path/to/Dockerfile and that's that. This has been one of Docker's strengths from the start; it makes building and running containers very easy.

Our challenge now is that we need to do that from within a Gitlab CI script. "So why can't we just do another docker build?"

Well, you can, but it's a Very. Bad. Idea.

Docker requires root authority in order to run on any system. Having that kind of access is risky for an automation system as it can lead to Supply Chain Attacks. You can still make this work, however; spin up a VM with Docker and SSH installed, install the Gitlab Runner, and configure a user to access the docker user account. This works and it is a common "build server" pattern. It also introduces another piece of infrastructure outside of either Gitlab.com or DigitalOcean k8s.

The problem is that if we want to use Gitlab.com to build the container, we have to do it from inside another container. Gitlab.com has a image runner available out of the box, which allows us to run a Docker image without any external resources. "So! Let's just put Docker inside a container!" Sure, you could do that. This Docker-in-Docker configuration is still commonplace, but it comes with a nasty catch.

Due to the way that Docker overlays users, when you run a container that's configured to run as root, it runs as root on the underlying Docker host. This is true even if you're just going to build a container, never mind running one. Furthermore, Docker-in-Docker needs to run in "privileged mode" and has access to the host's underlying Docker daemon. If you like living dangerously and paying pen-testers handsomely, be my guest.

Can it, Kaniko

What would be better is if we could build containers without the Docker daemon, yet still have the result be compatible with Docker. A few years ago this was a new idea. Jess Frazelle's img was one of the earliest, noteworthy efforts. It's still maintained today, although it's no longer the only option.

Another option is Google's Kaniko. Kaniko not only builds Docker-compatible images, but can do so without root, from within a container, and even push it to a registry for you. Bonus, it's also distributed as a container. This makes our job much easier, and is Gitlab's recommended method for building container images with Gitlab CI.

Kaniko needs a few things from us to work properly:

A Docker config.json to act as registry credentials.
A directory of files to act as a context (stuff to add to the container)
A Dockerfile
The registry URL, including the image name and tag.

A typical Kaniko command looks like this:

/kaniko/executor --context /path/to/my_project --dockerfile /path/to/my_project/Dockerfile --destination registry.example.com/my-project:latest

The config.json is assumed to be at:

/kaniko/.docker/config.json

We normally get that file by doing a docker login after the registry is set up, but you can also easily create it from a username and password with a bit of scripting:

echo "{\"auths\":{\"registry.example.com\":{\"username\":\"my_registry_user\",\"password\":\"my_registry_pass\"}}}" > /kaniko/.docker/config.json

Removing hard-coded values

This is all well and good, but it's best if we don't hard-code a lot of these values. This is especially true for the password. Fortunately, Gitlab has us covered here. Many of the above can be replaced with predefined variables that are part of every Gitlab CI build. Even better, Gitlab's own documentation shows how to use Kaniko.

The following is a .gitlab-ci.yml file which Gitlab CI uses as input to run builds. Typically, we save this to the root directory of our repository.

stages:
  - build

container:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
  only:
    - tags

It turns out that Gitlab when configured to act as a registry, will have the ability to expose the URL, the user, and the password to the CI job for our use. The username and password are also dynamically created only for the build, and removed afterward. This reduces the likelihood of compromised credentials becoming an attack vector.

Delicious in Dockerfile

Running an application, especially a web application, often requires a bit of a shift in thinking. We're used to thinking of "the site" and "the server" being separate entities. Site code is deployed onto the server, and the server remains largely static. This...isn't how Kubernetes works. The application and the operating system on which it runs are one in the same, packaged up inside a singular container image. Container images are often built using a Dockerfile, which describes how to build the container step by step.

Describing how to write a Dockerfile is out of scope for this post, but you can check out my Docker From Scratch series for detailed information.

One thing you might have noticed in the Kaniko command above is the --dockerfile parameter. Right now, this is pointed at $CI_PROJECT_DIR/Dockerfile, which, translates to the root directory of your repo. This canbe a problem depending on how your project is laid out. While you could put the Dockerfile anywhere in your project and update the Kaniko command accordingly, I prefer the approach of changing your repository layout instead:

path/to/my_project
├── .gitignore
├── .gitlab-ci.yml
├── Dockerfile
└── src/
    ├── core/
    └── index.php

This way, all the application source code (such as Drupal core) is in a subdirectory. This makes a clearer separation what is application code, and what is operational code like deployment scripts. When the application is a website, maintaining this separation is also a good security practice as "public" files are separated into their own directory.

This layout also makes it easy for us to leverage another feature of Docker: Docker Ignores. Like the .gitignore file, there's also a .dockerignore file. This allows us to selectively tell Docker -- or Kaniko -- what to ignore when building a container. When the container image build starts, the image builder (Docker daemon or Kaniko) will examine all the files within the build context. For Kaniko, this is specified by the --context parameter.

An overly larger build context can slow down the image build and consume a lot more memory as a result. Worse, we may include files we'd rather not have in the build, such as the Drupal files upload directory (sites/default/files) or the Composer vendor/ directory. The format is very similar to the .gitignore file, and should likely include similar entries. We can add the .dockerignore file to the root of our project:

path/to/my_project
├── .dockerignore
├── .gitignore
├── .gitlab-ci.yml
├── Dockerfile
└── src/
    ├── core/
    └── index.php

So what about the Dockerfile itself? This is highly dependent on your application. The ten7/flight-deck-drupal container image, however, is a minimal, and Kubernetes native container image that can show you the details. For my own site, the Dockerfile looks a bit like this:

FROM ten7/flight-deck-web:develop
MAINTAINER socketwench@example.com

# Switch to root for the build.
USER root

# Override some Ansible defaults.
ENV ANSIBLE_ROLES_PATH /ansible/roles
ENV ANSIBLE_NOCOWS 1

# Copy the files needed by the site.
COPY --chown=apache:apache ansible /ansible
COPY --chown=apache:apache config /var/www/config
COPY --chown=apache:apache solr-conf /var/www/solr-conf
COPY --chown=apache:apache src /var/www/html

# Do all the real work.
#
# Call setcap to ensure HTTPD can run as non-root...
# ...install Ansible dependencies from Galaxy...
# ...the kubectl binary (we'll use this later)...
# ...and finally run the Ansible playbook to build the site.
#
RUN setcap cap_net_bind_service=+ep /usr/sbin/httpd && \
    ansible-galaxy install -fr /ansible/requirements.yml && \
    curl https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl > /bin/kubectl &&\
    chmod +x /bin/kubectl &&\
    ansible-playbook -i /ansible/inventories/all.ini /ansible/build.yml

# Switch back to apache for runtime.
USER apache

# Only expose 80, as HTTPS is terminated in the ingress.
EXPOSE 80

Since our Dockerfile is just a Dockerfile, you can even test the build locally with a docker build. This will allow you to test that the Dockerfile and the resulting container is correct without worrying if Kaniko or Gitlab is mucking things up.

Once you're sure your Dockerfile is correct, we can add the .dockerignore, .gitlab-ci.yml, the Dockerfile, and any layout changes to the repo. Then, we can push that up to Gitlab.com.

We can monitor the build by going to CI/CD > Jobs and clicking on the most recent item in the list. Kaniko will helpfully output any commands as they are run within the container, and we can see that output in the Gitlab CI job page. Provided the build is successful, we can then go to Packages > Container Registry and view our built container.

A screenshot of a successful container build using Kaniko on Gitlab.com.

A screenshot of the container registry page on Gitlab.com, showing a built image.

Tag names, a Kubernetes Gotcha

One thing that will cause us problems in the future is how the Kaniko script provided to us by Gitlab names its images. By default, the image name will be named after our git repository. So, if you Gitlab.com project is socketwench/deninet.com, your image name will also be socketwench/deninet.com. Container images, however, also have a tag.

Image tags are used to for two purposes. Most often it is to provide several versions of the same application under the same container name. Going with our earlier example, this would be socketwench/deninet.com:7.5, socketwench/deninet.com:8.0, where 7.5 and 8.0 are the tag names respectively. The other purpose is to provide variants of the same image with different options. So socketwench/deninet.com:no-solr would be a variant where the tag name is no-solr.

The provided Kaniko script uses $CI_COMMIT_TAG for the image tag name. Furthermore, the only key specifies that Kaniko is only intended to run when a git tag is committed. This setup works and is far more natural for human beings as long as good discipline is used to always deploy new code with a different git tag.

This is essential as Kubernetes will examine any currently deployed images with the one you're trying to deploy. If the tag names are the same, k8s will assume there's no change, even if the underlying container image is different, and not deploy the new image! There's an easy way to solve this though. Instead of using the git tag for the docker image tag, we can use the commit's short ID. This value will always be unique as it is enforced by git.

Gitlab provides this as a variable, $CI_COMMIT_SHORT_SHA. So, we just need to swap around the values and change our only to build with any activity on the branch name instead:

stages:
  - build

container:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  only:
    - master

Definiting definitions

We're half-way there! The container is built and hosted behind a private registry on Gitlab.com. The next problem is turning that into a running application on Kubernetes.

If you've seen my Return of the Clustering talk, or taking one of my k8s classes, you already know you can't "just run" a container in Kubernetes. We need to create a number of definitions in order to deploy a complete application. This can get really complicated and there's no one way to do it. For my project, I decided to use the Flightdeck Cluster role on Ansible Galaxy. I could also use Helm Charts or even kubectl and some scripting. I maintain Flightdeck Cluster, so I'm taking the low-energy option.

Once I define the cluster in code in my preferred method, I can add it to the repo. But this is where we run into a problem.

Out of the box, Gitlab.com has a default image runner capable of running images. We ran the Kaniko image to build our container. While there is a Kubernetes runner for Gitlab, it's intention is to run pods -- a running container -- rather than deploy multiple definitions necessary to deploy an entire multi-tier application.

In order to do that, we need an active shell environment necessary to run Ansible, Helm, kubectl, etc., to apply custom definitions to our cluster. Those definitions tend to be YAML files which may or may not need some templating in order to be populated with key values. It's more than just running a few set commands.

We could allocate a Build Server again, but that invalidates all the progress we've made so far. Is there another way?

Well, we did just build a container...Maybe use that?

Double-duty

Normally, the container for my application runs a Drupal site. We can, however, instruct it to run a specific command instead. Since we just build it we can add whatever code we need to the repo, and augment the Dockerfile accordingly. Then, we can add an additional stage to our .gitlab-ci.yml to run the container, only instructing it to run our scripts to deploy the cluster instead.

Before we go on, I admit I'm not sure if this is the best idea from a security perspective. It adds additional tooling to the application container which is not necessary when running the application. It also leaves some code as to the structure of the cluster inside the container. I do avoid adding anything like a username or API key to the container image. There's another way to pass that when deploying to the cluster.

Running the same image we just built involves creating a new build stage in our .gitlab-ci.yml. This way we can enforce that the container always gets built first, then applied to the cluster. After that, we can use the image runner again, only this time running our image, and not Kaniko.

stages:
  - build
  - deploy

container:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  only:
    - master

live:
  stage: deploy
  image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  script:
    - "ansible-playbook -i /ansible/inventories/all.ini /ansible/live.yml"
  only:
    - master

Authorizing with the registry

The above...doesn't work. As soon as the image runner tries to pull our image from the registry it discovers it needs a username and password in order to access it. It doesn't have it, so the build fails frustratingly.

Like with Kaniko, we need to authenticate with the registry first. This time, however, we can't use echo to write the config.json first. Instead, we need to pass it to the runner as it starts execution. There's a few different ways to do that, but the preferred approach is to write out the config.json and provide it as the DOCKER_AUTH_CONFIG variable to the runner.

Sadly, it's not as simple as plugging the CI_REGISTRY_USER and CI_REGISTRY_PASSWORD as we did with Kaniko. Those values only enable to push to the registry during a build. They cannot be used for a pull. After some research, I found a page that suggested using gitlab-ci-token for the user, and CI_JOB_TOKEN for the password:

live:
  stage: deploy
  image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  variables:
    DOCKER_AUTH_CONFIG: |
      {
        "auths": {
          "$CI_REGISTRY": {
            "username": "gitlab-ci-token",
            "password": "$CI_JOB_TOKEN"
          }
        }
      }
  script:
    - "ansible-playbook -i /ansible/inventories/all.ini /ansible/live.yml"
  only:
    - master

The above uses YAML multiline syntax for clarity. This is a lot like the CI_REGISTERY_USER and CI_REGISTRY_PASSWORD we used for Kaniko, but it works for docker pulls instead.

Curiously, I also needed to define DOCKER_AUTH_CONFIG under Settings > CI/CD > Variables with a bogus value for this to work. I'm unsure as to why.

Deploying the cluster

Once we push this to the repo, CI will build a new container, and then in a second stage it will pull the same image, then run the script ansible-playbook -i /ansible/inventories/all.ini /ansible/live.yml inside the container. This playbook deploys the cluster, but it needs a few key pieces of data for it to work:

A DigitalOcean API key
The image name (with tag) we just built.

Fortunately, this isn't hard to pass to the container. We defined a Gitlab Secret variable for the API key under Settings > CI/CD > Variables. Gitlab will pass this automatically to the container when run during the build. The image name can be passed by defining it in the .gitlab-ci.yml file:

stages:
  - build
  - deploy

job_8_deploy:
  tags:
    - live
  stage: deploy
  script:
    - "ansible-galaxy install -fr ansible/requirements.yml --ignore-errors --ignore-certs"
    - "ansible-playbook -i ansible/inventories/8 ansible/8.yml"
  only:
    - '8.0'

job_release_deploy:
  tags:
    - stage
  stage: deploy
  script:
    - "ansible-galaxy install -fr ansible/requirements.yml --ignore-errors --ignore-certs"
    - "ansible-playbook -i ansible/inventories/release ansible/release.yml"
  only:
    - develop

container:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo "{\"auths\":{\"$CI_REGISTRY\":{\"username\":\"$CI_REGISTRY_USER\",\"password\":\"$CI_REGISTRY_PASSWORD\"}}}" > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  only:
    - master

live:
  stage: deploy
  image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  variables:
    DOCKER_AUTH_CONFIG: |
      {
        "auths": {
          "$CI_REGISTRY": {
            "username": "gitlab-ci-token",
            "password": "$CI_JOB_TOKEN"
          }
        }
      }
    CI_IMAGE_URL: "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA"
  script:
    - "ansible-playbook -i /ansible/inventories/all.ini /ansible/live.yml"
  only:
    - master

Once this is done, our build process is basically complete:

Gitlab's image runner sets up the runtime environment and starts Kaniko.
Kaniko builds and pushes the image to our registry.
Gitlab then pulls the newly built image, and runs the live.yml playbook.
The playbook utilizes the API key and the $CI_IMAGE_URL variable to build the cluster.

The playbook utilizes The ten7.flightdeck_cluster role on Ansible Galaxy to do all the deployment. Setting that up is a post in of itself, and may not be a preferred approach when Helm 3 is ready for production.

$CI_JOB_TOKEN, another Kubernetes gotcha

Something that's important to know about CI_JOB_TOKEN is that it only works for pulls during the CI build. Afterward, they are no longer valid. "What's wrong with that?" I said...until I started working with the cluster.

My live.yml playbook relied on CI_JOB_TOKEN to specify a Kubernetes definition parameter called an imagePullSecret. This is used by k8s to retrieve the image when it needs to be deployed. This works during the build, of course, but it causes problems later.

If a pod running our custom image crashes, is scaled up, or is outright deleted, k8s will attempt to re-pull the image...

...with outdated registry credentials.

This will lead to an imagePullBackoff state when you examine the pods with kubectl because, rightfully, our registry credentials are no longer valid. The solution is to pass registry credentials while applying k8s object definitions that will remain valid even after the build is complete. While we could pass our Gitlab username and password, a better approach is to create a Personal Access Token in Gitlab. The Personal Access Token takes the place of our password, and even works when Two Factor Authentication (2FA) is enabled. When allocating the token, be sure to at least grant it the read_registry permission.

Why buy one when you can have two at twice the price?

A downside to the above is that the container used to build the cluster is the same as the one that runs inside of the cluster later. This will result in a bigger image, since we will need to include utilities necessary for interacting with Kubernetes.

An alternative would be to build two containers. The first is the application container as described by this post. The second would be a container with set tag, such as apply. This second container would have a separate Dockerfile and be intentionally built with the purpose of applying definitions to the cluster. The second stage of the build would then invoke this apply container, deploying definitions to the cluster which in turn, deploy our application container.

I avoided doing this since it does involve building a secondary container on each build. At the cost of a longer execution time, the result is arguably more secure, since less code needs to be added to the application container.

Summary

With only Gitlab.com's free resources and a DigitalOcean account, you can create a Continuous Integration system to deploy your applications. The barrier isn't so much technological as it is understanding all the various pieces and how to make them work together. Creating this kind of pipeline not only requires an understanding of Gitlab CI, but of Docker, registries, Kubernetes, and the Gitlab image runner. It's a lot of information to absorb!

Thanks to my sponsors!

All parts and materials to make this project were paid for my supporters. If you like this post, consider becoming a supporter at:

Thank you!!!

Related posts