S3 – Doug's Home On The Web

September 15th, 2019September 15th, 2019

Doing Rollups of AWS S3 Server Access Logs

If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.

But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute, each with as few as one event in them. It comes out looking like this:

2019-09-14 13:26:38        835 s3/www.pa-furry.org/2019-09-14-17-26-37-5B75705EA0D67AF7
2019-09-14 13:26:46        333 s3/www.pa-furry.org/2019-09-14-17-26-45-C8553CA61B663D7A
2019-09-14 13:26:55        333 s3/www.pa-furry.org/2019-09-14-17-26-54-F613777CE621F257
2019-09-14 13:26:56        333 s3/www.pa-furry.org/2019-09-14-17-26-55-99D355F57F3FABA9

At that rate, you will easily wind up with 10s of thousands of logfiles per day. Yikes.

Dealing With So Many Logfiles

Wouldn’t it be nice if there was a way to perform rollup on those files so they could be condensed into fewer bigger files?

Well, I wrote an app for that. Here’s how to get started: first step is that you’re going to need to clone that repo and install Serverless:

git clone git@github.com:dmuth/aws-s3-server-access-logging-rollup.git
npm install -g serverless

Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.

So once you have the code, here’s how to deploy it:

cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.

January 5th, 2018March 16th, 2019

What’s Taking Up So Much Space in AWS S3?

I’m a big fan of Amazon S3 for storage, and I like it so much that ~~I use~~ ~~Odrive~~ ~~to sync folders from my hard drive into S3~~ use S3 to store copies of all of my files from Dropbox as a form of backup. I only have about 20 GB of data that I truly care about, so that should be less than a dollar per month for hosting, right? Well…

*“You are not your job or how much data you have in S3!”*

Close to 250 GB billed for last month. How did that happen?

December 11th, 2016July 20th, 2019

How to Undelete Files in Amazon S3

While S3 is a great storage platform, what happens if you accidentally delete some important files? Well, S3 has a mechanism to recover deleted files, and I’d like to go into that in this post.

First, make sure you have versioning enabled on your bucket. This can be done via the API, or via the UI in the “properties” tab for your bucket. Versioning saves every change to a file (including deletions) as a separate version of said object, with the most recent version taking precedence. In fact, a deletion is also a version! It is a zero-byte version which has a “DELETE” flag set. And the essence of recovering undeleted files simply involves removing the latest version with the “DELETE” flag.

This is what that would look like in the UI:

To undelete these files, we’ll use a script I created called s3-undelete.sh, which can be found over on GitHub:

February 9th, 2014March 16th, 2019

The Importance of Having More Than One Backup

Many years ago, I wanted to make sure my data was secure, so I purchased a fireproof media safe from a (now defunct) company called FireCooler. I thought it would be a good idea to have a UL 125-rated safe which could keep an internal temperature of less than 125 degrees over an hour long fire. I regularly made backups to DVD and stored them in the safe.

Well, sometimes the best laid plans can go awry, and that was the case the other day when I went to put something in my safe, and found that it was flooded with water:

How did this happen? Did something in the safe suck in tons of moisture? Did the basement somehow flood and not cause water damage elsewhere? To this day, I am still not sure. I did not see any evidence of flooding in my storage area–nothing else was damaged.

Before I switched over to WordPress, one person pointed out that the safe my have been insulted with “water glass”, specifically from US Patent US7459190:

Outer wall composed of water glass sodium silicate solution that is 40% solids, 60% water, and having a silicon oxide:sodium oxide ratio in the range of 2:1 to 4:1, calcium chloride, and an additive chosen from calcium oxide or calcium hydroxide […] After curing, water released from the solidified insulation can migrate to and leak from pinhole defects which sometimes occur in the plastic shell.
Patent US7459190

So that’s a possibility, but I am not a chemist, so proving such a thing would be beyond me.

The takeaway here is that I had backups stored elsewhere so no actual data was lost. I recommend that everyone reading this, if they care about their data, to do the exact same thing. Here are a few resources for backups:

DropBox
Amazon S3
Amazon EC2 (Bonus: You can take the instance offline when you’re not backing up to it, and an instance that is not online cannot be hacked into. )
Carbonite

Happy Backups!