Tarsplit is a utility I wrote which can split UNIX tarfiles into multiple parts while keeping files in the tarballs intact. I specifically wrote those because other ways I found of splitting up tarballs didn’t keep the individual files intact, which would not play nice with Docker.
But what does Docker have to do with tar?
While building the Docker images for my Splunk Lab project, I noticed that one of the layers was something like a Gigabyte in size! While Docker can handle large layers, the issue become one of time it takes to push or pull an image. Docker does validation on each layer as it’s received, and the validation of a 1 GB layer took 20-30 wall clock seconds. Not fun.
It occurred to me that if I could split up that layer into say, 10 layers of 100 Megabytes each, then Docker would transfer about 3 layers in parallel, and while a layer is being validated post-transfer, the next layer would start being transferred. The end result is less wall clock seconds to transfer an entire Docker image.
But there was an issue–Splunk and its applications are big. A few of the tarballs are hundreds of Megabytes in size, and I needed a way to split those tarballs into smaller tarballs without corrupting the files in them. This led to Tarsplit being written.
If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.
But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute,each with as few as one event in them. It comes out looking like this:
Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.
So once you have the code, here’s how to deploy it:
cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
I’ve written about Docker before, as I am a big fan of it. And for this post, I’m going to talk about some practical situations in which I’ve used Docker in real life, both for testing and software development!
But first, let’s recap what Docker IS and IS NOT:
Docker containers DO spin up quickly (1-2 seconds or less)
Docker containers DO have separated process tress and filesystems
Docker containers ARE NOT virtual machines
Docker containers ARE intended to be ephemeral. (short-lived)
You CAN, however, mount filesystems from the host machine into Docker, so those files can live on after the container shuts down (or is killed).
You SHOULD only run one service per Docker container.
Everybody got that? Good. Now, let’s get into some real life things I’ve used Docker for.
Experimenting in Linux
Want to test out some commands or maybe a shell script that you’re worried might be destructive? No worries, try it in a Docker container, and if you nuke the filesystem, there will be no long-term consequences.
# Start a container with Alpine Linux
$ docker run -it alpine
# Let's do something dumb
$ rm /bin/ls
$ ls -l
/bin/sh: ls: not found
# Just exit the container, restart it, and our filesystem is back!
[unifi:~/tmp ] $ docker run -it alpine
$ ls /
bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var
And all of the above takes just a couple of seconds! This works with other Linux distros as well, such as CentOS and Unbuntu–just change your Docker command accordingly:
docker run -it centos
docker run -it ubuntu
Yes, that means you could run CentOS in a container under Ubuntu or vice-versa. Docker doesn’t care. 🙂