fbpx
How to Build an Automated Development Pipeline How to Build an Automated Development Pipeline
I have led many development teams in the past years. One thing that I found common among them is the need to... How to Build an Automated Development Pipeline

I have led many development teams in the past years. One thing that I found common among them is the need to build an automated development pipeline. You don’t necessarily need a sophisticated development pipeline. I found even a basic pipeline can prevent many frustrations in your daily development. If you want to develop a software product, even a small gadget, a development pipeline is essential for you. An automated development pipeline helps you continuously evaluate the sanity of the product. Nothing is more hurtful than broken software for the development team. You must do everything in your power to prevent it. Let me share two scenarios where you feel this need from the bottom of your heart.

  • Scenario 1 (when you want to protect the master branch from unintentional errors) — You are working in a team and using a git repository to manage development. Every day, there are several pull requests to be merged into the master branch. Everyone does their best to write high-quality functional code but every now and then errors are injected into the master branch. As a result, the codebase stopped functioning, and, therefore, the team becomes frustrated. The progress becomes much slower and deadlines are missed one after the other. You wished there was a way to prevent this from happening.
  • Scenario 2 (when you want to expedite the pace of development) — You are developing a web scraper to collect data for a data science project. The data science team is in need of a large amount of data every day. You can not afford any delay to parse data in the machine learning pipeline. Nevertheless, every now and then the HTML structure in the targeted website gets changed which causes the scraper to stop functioning. You wished there was a way to automate the testing of the scraper and recognize the error as soon as possible.

In this article, I describe the development pipeline in simple words. I also describe the most important steps in building an automated development pipeline:

  • How to build and spin up a server
  • How to build a containerized solution
  • How to automatedly build the containerized solution

I hope this helps you become more comfortable with building a development pipeline to stay away from unnecessary frustrations.

The development pipeline is a series of commands that automatedly execute in a row to test, build, or deploy a software product. These commands run on different levels and using various tools. For example, a server is needed to build the development pipeline. A question is “How to configure this server?” You can configure it manually but the best practice is to use Docker technology.

Using Docker technology has many advantages such as the ability to migrate from one service to another when needed with minimum hassle. When you start building a real-world product, you will find out how important it is to be able to migrate from a service to another. That happened to me several months ago. We were asked to migrate from GCloud to OVH since the company had some other services on OVH infrastructure. It takes us less than a few days to migrate everything from GCloud to OVH. Just assume how it could take if we configure things manually!

The development pipeline is a series of commands that automatedly execute in a row to test, build, or deploy a software product.

It takes three steps to build an automated development pipeline: (a) build and spin up a server, (b) build a containerized solution, (c) set up a series of commands in a row.

How to Build and Spin Up a Server

It takes three steps to spin up a server: (1) build docker images, (2) store docker images, (3) run docker images. If you don’t want to have an automated development pipeline, you don’t need to take this step. Basically, you can run it on your own machine. However, if you want to set up a CI/CD tool you must spin up a server for that purpose.

1. Build a Docker image — You need to spin up a server to build an automated development pipeline. There are many ways to do this. My suggestion is to build a Linux-based Docker image with all the required libraries installed. You can build a Docker image by writing a DockerfileA Dockerfile is a text document that contains all the commands you use to install all the required libraries and packages. You can build a Docker image based on a Dockerfile using the code below.

docker build —f Dockerfile -t REPOSITORY_NAME/IMAGE_NAME:TAG .

To find out how to write a Dockerfile for a server, you can read this article: How to Create an Ubuntu Server to Build an AI Product Using Docker.

2. Store a Docker Image — Then, you must store the Docker image in a Docker registry. Docker registries are basically a place to store Docker images and nothing more. You can choose any service including DockerHub or Amazon ECR. If you have no idea, I recommend using DockerHub since it is native to Docker command-line interface and makes your life a bit easier compared to other services. There will be some concerns about security and efficiency in choosing a registry that is not important at this stage. For example, if you have a complex pipeline, it would be better to use a registry service that is located next to other cloud services in use. This causes commands such as docker pull or docker push executes faster. You can push a Docker image to a remote Docker repository using the code below.

docker push REPOSITORY_NAME/IMAGE_NAME:TAG

3. Run a Docker image — You can run the Docker image on many services to spin up a server. The last two steps were about how to build and store a Docker image to be used as a server. This step is about how to spin up the server. You don’t need a server if you don’t want to have an “automated” development pipeline.

Let me share a code snippet used for spinning up a server on CircleCI, a well-known CI/CD tool being used these days. The code below is a segment in defining a test job to be run on a server that is spun up using a Docker image. This is the syntax used by CircleCI, so other CI/CD tools may be different but the concept is the same. After spinning up, other steps such as checkout can be run. checkout is a special code used by CircleCI to clone a code repository on the server which was spun up a step before. If you want to read more about CircleCI, I recommend you read this article: How to Learn CircleCI in Simple Words.

jobs:  
  test:
    docker:     
      - image: REPOSITORY_NAME/DOCKERIMAGE_NAME:TAG
        auth: 
          username: SUSERNAME
          password: SPASSWORD
    steps:      
      - checkout      
      - run:
          ...

How to Build a Containerized Solution

The most important step in building software is to make sure it works. You can see it as an “Integration test” where individual modules are combined and tested as a group. So, if you are developing a data science project you may want to build an integration test and if you are developing a web app you may want to build the containerized solution using Docker.

Using Docker technology, you can build a Docker image for a server and a Docker image for your solution. The challenges that you confront with building Docker images for servers and solutions are similar; however, they have different purposes. As you read above, you must have a Dockerfile as a “how-to” prescription to build your solution. When you have a functional Dockerfile, you are ready to go to the next step. Note that these are building blocks of a development pipeline explained in the next section.

How to Automatedly Build the Containerized Solution

A development pipeline is a series of commands that are executed in a row. You can automatedly execute a series of commands in two ways: a Bash script or a CI/CD tool. The latter gives you more options to configure; however, the former one can solve many problems by itself. Let’s get dive into what those tools are.

1. Bash Script (when you want to build a pipeline) — A Bash script is a text file containing a series of commands that run back to back when you run the script. Each row in the Bash script is a command that you previously execute in the terminal to do a specific task. The order of commands written in a Bash script is exactly the same as the order in which you run them in the terminal. The below code gives you a hint about what bash script is. Note that Bash scripting is a bit different across Windows and macOS which is not the focus of this article. You can build and push a Docker image using the code below written in Bash script.

#!/bin/bash 
export IMAGE_NAME=XXX
export USERNAME=XXX
export PASSWORD=XXXdocker login -u SUSERNAME -p SPASSWORD
docker build -f Dockerfile -t SIMAGE_NAME .
docker push SIMAGE_NAMEE

2. CI/CD (when you want to build an “automated” pipeline) — A CI/CD (i.e., continuous integration/continuous deployment) tool is a specialized software mostly run on the cloud to automatedly execute a series of commands with a specific order when they get triggered. They are often configured using a YAML file where you can define a number of “jobs” and “workflows”. A “job” is a collection of steps that are executed in a single unit similar to a Bash script explained above. A “workflow” is a set of rules to run a collection of jobs in a specific order if needed.

I highly recommend writing a Bash script first and, then, starting configuring the CI/CD tool of your choice. You may ask why? First, you most probably needed all the lines of code written in the Bash script to configure the CI/CD tool. Second, it would be a bit harder to set up the CI/CD tool which may create difficulties if you are not an expert. The code below is a segment of a job to be run on CircleCI. As you can see the commands are exactly the same as the Bash script above but in YAML form that can be parsed by CircleCI.

...
      - run:
          name: Authentication
          command: docker login -u SUSERNAME -p SPASSWORD
      - run:
          name: Build 
          command: docker build -f Dockerfile -t SIMAGE_NAME .
      - run:
          name: Store
          command: docker push SIMAGE_NAME
...

An automated development pipeline is a complicated concept and can be implemented in various forms. In this article, I tried to share the main concepts with you. You should definitely need to learn more to build a useful pipeline for your software product. Nevertheless, what you need to take away from this article is even a small automated pipeline can help you a lot. So, I suggest you build an automated pipeline today. You will never be regretful!

Article originally posted here. Reposted with permission.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.

1