Doing Rollups of AWS S3 Server Access Logs

If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.

But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute, each with as few as one event in them. It comes out looking like this:

2019-09-14 13:26:38        835 s3/www.pa-furry.org/2019-09-14-17-26-37-5B75705EA0D67AF7
2019-09-14 13:26:46        333 s3/www.pa-furry.org/2019-09-14-17-26-45-C8553CA61B663D7A
2019-09-14 13:26:55        333 s3/www.pa-furry.org/2019-09-14-17-26-54-F613777CE621F257
2019-09-14 13:26:56        333 s3/www.pa-furry.org/2019-09-14-17-26-55-99D355F57F3FABA9

At that rate, you will easily wind up with 10s of thousands of logfiles per day. Yikes.

Dealing With So Many Logfiles

Wouldn’t it be nice if there was a way to perform rollup on those files so they could be condensed into fewer bigger files?

Well, I wrote an app for that. Here’s how to get started: first step is that you’re going to need to clone that repo and install Serverless:

git clone git@github.com:dmuth/aws-s3-server-access-logging-rollup.git
npm install -g serverless

Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.

So once you have the code, here’s how to deploy it:

cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
Continue reading “Doing Rollups of AWS S3 Server Access Logs”

Hotel Opening: Anthrocon 2019 Report

One of my activities outside of the office consists of staffing furry conventions. One of those conventions is Anthrocon, a furry convention held in downtown Pittsburgh every June/July. At that particular convention, I manage the website and their social media properties.

Yesterday, we opened general hotel reservations, and that resulted in a huge rush of members booking hotel rooms. 1,000 rooms were booked in the first 15 minutes! This was completely expected, and we kept track of how things played out on social media, and also took a survey of members who booked hotel rooms to see how things went. In this post, we’re going to share what we learned based on those survey results and Twitter activity.

HOTEL BOOKINGS

First, did people who booked a hotel room get the hotel that they wanted?

Did you get the hotel you wanted?

For nearly 70% of you, the answer is yes. This makes us happy, but we would like to see the number higher—ideally 100% of our attendees would get a room in the hotel of their choice. This is something we continue to work on each year by adding new hotels and getting bigger room blocks in existing hotels.

Continue reading “Hotel Opening: Anthrocon 2019 Report”

What’s Taking Up So Much Space in AWS S3?

I’m a big fan of Amazon S3 for storage, and I like it so much that I use Odrive to sync folders from my hard drive into S3 use S3 to store copies of all of my files from Dropbox as a form of backup.  I only have about 20 GB of data that I truly care about, so that should be less than a dollar per month for hosting, right? Well…

“You are not your job or how much data you have in S3!”

Close to 250 GB billed for last month. How did that happen?

Continue reading “What’s Taking Up So Much Space in AWS S3?”

Dockerizing Discord’s Music Bot in Amazon ECS

I’m a big fan of the Discord Musicbot, and run it on some Discord servers that I admin. Wanting to run it on a server, I first created an Ansible playbook and launched a server on Digital Ocean. But after a few months, I noticed that the server was sitting over 90% idle. Surely there had to be a better way.

So I next tried Docker, and created a Dockerized version of the Musicbot. I was quite happy with how much easier it was to spin up the bot, but still didn’t want to run it on a dedicated server on Digital Ocean. Aside from having unused capacity, if that machine were to go down, I’d have to do something manually.

I thought about running it in some sort of hosted Docker cluster, and came across Amazon’s container service. So this post is about creating your own cluster in ECS and hosting a Docker container in it. I found the process slightly confusing when I did it the first time, and wanted to share my experience here.

Continue reading “Dockerizing Discord’s Music Bot in Amazon ECS”

How to Undelete Files in Amazon S3

While S3 is a great storage platform, what happens if you accidentally delete some important files? Well, S3 has a mechanism to recover deleted files, and I’d like to go into that in this post.

First, make sure you have versioning enabled on your bucket. This can be done via the API, or via the UI in the “properties” tab for your bucket. Versioning saves every change to a file (including deletions) as a separate version of said object, with the most recent version taking precedence. In fact, a deletion is also a version! It is a zero-byte version which has a “DELETE” flag set. And the essence of recovering undeleted files simply involves removing the latest version with the “DELETE” flag.

This is what that would look like in the UI:

To undelete these files, we’ll use a script I created called s3-undelete.sh, which can be found over on GitHub:

Continue reading “How to Undelete Files in Amazon S3”

The Importance of Having More Than One Backup

Many years ago, I wanted to make sure my data was secure, so I purchased a fireproof media safe from a (now defunct) company called FireCooler. I thought it would be a good idea to have a UL 125-rated safe which could keep an internal temperature of less than 125 degrees over an hour long fire. I regularly made backups to DVD and stored them in the safe.

Well, sometimes the best laid plans can go awry, and that was the case the other day when I went to put something in my safe, and found that it was flooded with water:

IMG_3372 IMG_3374

How did this happen? Did something in the safe suck in tons of moisture? Did the basement somehow flood and not cause water damage elsewhere? To this day, I am still not sure. I did not see any evidence of flooding in my storage area–nothing else was damaged.

Before I switched over to WordPress, one person pointed out that the safe my have been insulted with “water glass”, specifically from US Patent US7459190:

Outer wall composed of water glass sodium silicate solution that is 40% solids, 60% water, and having a silicon oxide:sodium oxide ratio in the range of 2:1 to 4:1, calcium chloride, and an additive chosen from calcium oxide or calcium hydroxide […] After curing, water released from the solidified insulation can migrate to and leak from pinhole defects which sometimes occur in the plastic shell.

Patent US7459190

So that’s a possibility, but I am not a chemist, so proving such a thing would be beyond me.

The takeaway here is that I had backups stored elsewhere so no actual data was lost. I recommend that everyone reading this, if they care about their data, to do the exact same thing. Here are a few resources for backups:

Happy Backups!

Notes from “Scaling on AWS for the First 10 Million Users”

Earlier tonight, I had the pleasure of attending a presentation from Chris Munns of Amazon at the offices of First Round Capital about scaling your software on AWS past the first 10 million users. I already had some experience with AWS, but I learned quite a few new things about how to leverage AWS, so I decided to write up my notes in a blog post for future reference, and as a service to other members of the Philadelphia tech community.

Without further preamble, here are my notes from the presentation:

Continue reading “Notes from “Scaling on AWS for the First 10 Million Users””