Doing Rollups of AWS S3 Server Access Logs

If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.

But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute, each with as few as one event in them. It comes out looking like this:

2019-09-14 13:26:38        835 s3/www.pa-furry.org/2019-09-14-17-26-37-5B75705EA0D67AF7
2019-09-14 13:26:46        333 s3/www.pa-furry.org/2019-09-14-17-26-45-C8553CA61B663D7A
2019-09-14 13:26:55        333 s3/www.pa-furry.org/2019-09-14-17-26-54-F613777CE621F257
2019-09-14 13:26:56        333 s3/www.pa-furry.org/2019-09-14-17-26-55-99D355F57F3FABA9

At that rate, you will easily wind up with 10s of thousands of logfiles per day. Yikes.

Dealing With So Many Logfiles

Wouldn’t it be nice if there was a way to perform rollup on those files so they could be condensed into fewer bigger files?

Well, I wrote an app for that. Here’s how to get started: first step is that you’re going to need to clone that repo and install Serverless:

git clone git@github.com:dmuth/aws-s3-server-access-logging-rollup.git
npm install -g serverless

Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.

So once you have the code, here’s how to deploy it:

cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
Continue reading “Doing Rollups of AWS S3 Server Access Logs”

Saving and Restoring Your development/ Directory

If you’re like me, you write a fair bit of a code, which means you have to interact with many Git repositories. If you’re also like me, chances are you have them in a directory called development/ or similar. It might even have some nested directories, something like this:

./allaboutcheetahs.info
./diceware
./docker/check-disk-space
./docker/health-check
./node/circuitbreaker-demo
./node/neural-network
./s3/bucket-sizes
./s3/disk-usage
./snowdrift
./ssh-to
Code
Code you may write someday.

So that’s cool, but let’s say that you get a new machine and you want replicate your development/ directory structure onto it? One way is to check out everything by hand, but that’s laborious and time consuming. A second way is to keep backups–and you should absolutely do this–but aside from challenges of restoring a single directory out of an entire archive, what if that backup doesn’t have the latest commits in it?

I can now offer a third way. I recently wrote a couple of scripts available on GitHub that can be used to extract Git remote from each repo in an entire directory stucture, and save those remotes and the directories they belong in to a file. Given the above example, it might look something like this:

./allaboutcheetahs.info git@github.com:dmuth/dmuth.github.io.git
./diceware      git@github.com:dmuth/diceware.git
./docker/check-disk-space       git@github.com:dmuth/docker-check-disk-usage.git
./docker/health-check   git@github.com:dmuth/docker-health-check.git
./node/circuitbreaker-demo      git@github.com:dmuth/another-circuit-breaker.git
./node/neural-network   git@github.com:dmuth/neural-network.git
./s3/bucket-sizes       git@gitlab.com:dmuth/s3-bucket-sizes.git
./s3/disk-usage git@github.com:dmuth/s3-disk-usage.git
./snowdrift     git@github.com:Comcast/snowdrift.git
./ssh-to        git@github.com:Comcast/ssh-to.git
Continue reading “Saving and Restoring Your development/ Directory”