In modern software development, speed is crucial. The faster you can build, test, and deploy your code, the more agile your team becomes. However, even with robust CI/CD pipelines in place, Docker image builds can become a bottleneck, slowing down development and deployment. This is where Docker Image Optimization becomes essential—especially for complex applications like those built with Python, where heavy dependencies often lead to long build times and bloated images.
In this blog, we share how we optimized our Python builds on AWS CodeBuild using ECR and S3 caches to achieve 65% faster builds and smaller images. Join us as we walk through the strategies and techniques we used to supercharge our Docker builds and make our CI/CD pipeline smarter and faster.
What is a Docker Image?
A Docker image is a lightweight, standalone, and executable package that includes everything needed to run a software application. It contains the application code, libraries, system tools, and settings required to run a program inside a container.
Think of a Docker image as a blueprint for creating containers. It’s the static part of the container lifecycle, while a Docker container is the running instance of that image.
How to Make a Docker Image?
Creating a Docker image typically involves writing a Dockerfile, which is a text document that contains all the instructions that Docker needs to build your image. Here’s a simple guide:
Steps to Create a Docker Image
1. Create a Dockerfile:
This file contains a set of commands, starting from a base image, installing dependencies, copying files, and setting entry points.
Example Dockerfile:
FROM python:3.10-slim
COPY. /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]
2. Build the Docker image:
Run the following command in the directory where your Dockerfile is located:
docker build -t myimage .
This will create a Docker image with the tag myimage.
3. Run the image as a container:
Once the image is built, you can run it as a container:
docker run -i -t -d myimage
Where are Docker Images Stored?
Docker images are stored in a Docker registry, which is a service for storing and distributing Docker images. The most commonly used registry is Docker Hub, but you can also use private registries like Amazon ECR (Elastic Container Registry) or Google Container Registry.
When you pull an image using docker pull or push it with docker push, you’re interacting with a registry. Images on your local machine are stored in Docker’s internal storage system, typically located in /var/lib/docker (on Linux). You can list your locally stored images with:
docker images
Docker Image Optimization
As our containerized applications grew larger, our Docker build times started to become a real bottleneck. Each new CodeBuild job inside AWS was starting from scratch, reinstalling Python dependencies and rebuilding unchanged layers.
We wanted to make our CI/CD pipeline smarter, not harder, so we leveraged Docker BuildKit, Amazon ECR remote caching, and S3 caching to reduce both build time and image size drastically.
This post walks through our full setup – the Dockerfile, CodeBuild buildspec, and the performance impact we achieved – all in a production-grade Python project.
Understanding the Problem
By default, AWS CodeBuild launches a new, ephemeral build container for every pipeline run. That means:
- No previous Docker layers are available.
- Each pip install and apt-get runs from scratch.
- Even unchanged layers are rebuilt.
For large Python applications with hundreds of dependencies, that easily translates to 10–15 minutes of wasted time per build.
Step 1: Multi-Stage Dockerfile for Docker Image Optimization
We started by refactoring our Dockerfile into three stages – base, dependencies, and final – enabling layer-based reuse.
# syntax=docker/dockerfile:1.4
FROM python:3.10-slim as base
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
FROM base as dependencies
WORKDIR /app
COPY requirement.txt .
RUN --mount=type=cache,target=/root/.cache/pip,sharing=locked \
pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirement.txt
FROM dependencies as final
ARG ENV
ENV ENV=$ENV
COPY . .
EXPOSE 9000
CMD ["sh", "-c", "python3 chat.py --env $ENV"]
Key optimizations:
- The COPY requirement.txt step isolates dependency installation.
→ Only rerun when requirements actually change. - BuildKit’s –mount=type=cache keeps pip and apt downloads cached.
- Multi-stage builds strip unnecessary layers, producing a leaner final image.
Step 2: Enabling Docker BuildKit in CodeBuild
In our buildspec.yml, we enabled BuildKit and installed docker-buildx for multi-platform builds:
BuildKit brings content-based caching – it hashes every instruction and file, only rebuilding layers that change.
Step 3: Leveraging Amazon ECR for Remote Layer Caching
Here’s the crucial piece:
--cache-from type=registry,ref=$ECR_REPOSITORY_URI:cache \
--cache-to type=registry,ref=$ECR_REPOSITORY_URI:cache,mode=max
With these flags, our Docker layers are stored in ECR and reused across builds – even when CodeBuild spins up on a new host.
How it works:
- Before the build, BuildKit checks ECR for the cached image (:cache tag).
- Layers with matching checksums are pulled and reused.
- New or changed layers are pushed back to ECR for next time.
Result:
If requirement.txt didn’t change, dependencies are reused instantly from cache.
Step 4: Directory Caching via Amazon S3
To further improve dependency caching, we used CodeBuild’s S3 cache feature:
This ensures that even if BuildKit cache misses, local dependency downloads are still reused between builds.
These directories are synced to an S3 bucket behind the scenes, providing persistent caching for:
- Python wheels (pip)
- Browser binaries (ms-playwright)
Step 5: The Complete buildspec.yml Workflow
Here’s the summarized build sequence:
- CodeBuild starts a new container.
- Buildx is initialized (multi-arch builder).
- ECR login allows pulling previous cache:
docker buildx imagetools inspect $ECR_REPOSITORY_URI:cache - Buildx build runs:
- Reads previous cache from ECR (–cache-from)
- Uses content hashes to skip unchanged steps
- Writes new/updated cache back (–cache-to)
- Pushes your final image tagged with commit SHA ($TAG)
- Post-build: creates imagedefinitions.json for deployment.
- Reads previous cache from ECR (–cache-from)
Docker Image Optimization Architecture Diagram
Below is a visual overview of how caching works across our CI/CD pipeline:
Results: Measurable Performance Gains
The improvement was especially noticeable in CI/CD pipelines where CodeBuild runs frequently – every push, merge, or PR now builds in a fraction of the time.
Conclusion
Docker Image Optimization is not just about speeding up build times – it’s about making your entire CI/CD pipeline more efficient and cost-effective. By leveraging tools like Docker BuildKit, Amazon ECR, and S3 caching, we were able to reduce build times by 65% and shrink image sizes by 35%. This not only improved developer productivity but also significantly reduced network bandwidth usage and unnecessary layer rebuilding. The combination of these strategies ensures that your builds are faster, leaner, and more reliable.
At Xcelore, we understand that every second counts when it comes to deployment. Our expertise in Cloud and DevOps services can help you streamline your CI/CD pipelines, optimize your infrastructure, and drive better performance at scale. Whether you’re looking to accelerate your Docker image builds, migrate to the cloud, or implement a robust DevOps culture, we’re here to help.
Ready to supercharge your CI/CD pipelines and cloud infrastructure?
Contact us today to learn how Xcelore’s Cloud and DevOps services can take your projects to the next level!


