You may have noticed we publish a lot of blogs here at DoltHub. We write 5 blogs a week and every employee contributes. We’ve made some overdue internal improvements to our blog recently to improve both the developer and author experience.
We had made a decision when we started this blog back in 2019 to put the images used in our articles in our Git monorepo, which houses the code for many of our web products. This made sense for us at the time - our blog was small and having our images version controlled and everything in the same place was useful. However, now that our blog has grown to over 1,000 articles and 3,000 images, we started to hit some issues.
The problem#
Our blog has a lot of images and we put no size constraint on the images being added to our repository. The images directory had grown to about 1GB, which caused two main problems:
-
Our repository outgrew our GitHub Actions storage allotment. If we wanted to increase storage, we would need to upgrade to GitHub Enterprise, which would be costly. We added some hacky steps to our workflows to delete unnecessary tools and dependencies, but this was only going to hold us over for so long.
-
Our blog build times were slow. Before we migrated our blog framework from Gatsby to Astro, our GitHub Actions workflow blog deployment took almost 40 minutes. We added some caching workflows to significantly decrease these build times to around 7 minutes, but it could be cumbersome to maintain. After our migration to Astro, our deployment workflow took about 20 minutes without any caching, most of the time spent on image processing.
The solution#
Adding a cache workflow like we did for Gatsby could have solved problem #2, but it did not address the root cause that putting all of our blog images in Git was not sustainable long term.
It was time to move our images out of our repository.
Aside from solving our two bigger problems above, we had some other factors to consider. Everyone at DoltHub writes blogs and we publish daily, so it was important that whatever solution we chose didn’t severely impact the blog author experience. Since we are already heavily integrated with AWS, it made most sense to store our images in S3 with Cloudfront in front, rather than figure out a completely new CDN product and system. We also didn’t want to fully lose version control of images—if an image accidentally gets deleted or changed, there should be a record.
How it works#
For some background on Astro, any image stored in src is transformed, optimized, and bundled. Images in public are served as-is without any processing. Local images stored in src can be referenced by the relative path in the blog article markdown (i.e. a blog in src/content/blog would reference an image like this: ).
Our workflow maintains this author experience while moving images to S3 behind the scenes. Here’s how it works:
1. Author workflow (unchanged)#
Authors continue to work the same as before: they place images in src/images (with any subdirectory structure) and reference images in markdown using relative paths like .
2. Publishing images to S3#
Before deploying, authors run yarn publish-images, which scans src/images for all image files (PNG, JPG, SVG, WebP, GIF), validates each image by checking file headers to ensure they’re valid image files, and optimizes images by resizing to max 1600px width and converting to WebP (except SVG and GIF, which are kept as-is). The script generates content hashes for each image, including content-type in the hash to ensure proper MIME types, then uploads both versions to S3: the optimized version at blogimages/{path}/{hash}.webp and the original at blogimages/{path}/original/{hash}.{ext}. Finally, it creates overlay files in src/overlays with JSON mappings like:
{
"/images/my-image.png": "https://static.dolthub.com/blogimages/my-image.png/abc123...webp"
}
3. Build-time path mapping#
During the Astro build process, a custom plugin scans all markdown files for image references, looks up each relative image path in the corresponding overlay JSON file, replaces the local path with the CloudFront URL from S3, and serves images directly from CloudFront CDN (not bundled in the build).
This means authors never need to think about S3 URLs - they just use local paths, and the build system handles the rest.
The result#
The migration was successful and solved our core problems, though it came with some trade-offs.
What we gained#
Performance improvements: Build time reduced from 22 minutes to 7 minutes (68% faster), we removed 1GB of images from our repository, and we no longer have GitHub Actions storage capacity concerns.
Maintained author experience: Authors still use the same workflow placing images in src/images and referencing them with relative paths. Overlay files provide a version-controlled record of image mappings, and the build system handles all the complexity automatically.
Scalability: We’re no longer constrained by repository size limits and can handle unlimited images without impacting build performance, and images are optimized and cached at the CDN level.
Trade-offs#
Setup complexity: Every author needs AWS credentials configured (requires AWS SSO login). Our team has 14 developers (most already have AWS access for on-call) and 3 non-developers who need additional setup. Non-developers can no longer add images directly through GitHub’s web interface.
Lost conveniences: We can’t search image contents directly in GitHub, images don’t appear in GitHub pull request diffs, and we can’t preview images in the GitHub file browser.
These trade-offs haven’t caused issues so far, but they were nice-to-have features we may end up missing from the old workflow.
Conclusion#
Overall we are happy with the results of our new blog images workflow, despite some trade offs that come along with it. Our blog has been a great way for us to tell the world about Dolt, discuss our learnings about Golang, AI, web frameworks, the startup experience, etc., and share our journey with the community. We appreciate you for being a reader!
Have you faced similar challenges with managing images or assets for a static site? Come discuss how you solved them by chatting with us in our Discord.
