Doing Rollups of AWS S3 Server Access Logs

If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.

But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute, each with as few as one event in them. It comes out looking like this:

2019-09-14 13:26:38        835 s3/www.pa-furry.org/2019-09-14-17-26-37-5B75705EA0D67AF7
2019-09-14 13:26:46        333 s3/www.pa-furry.org/2019-09-14-17-26-45-C8553CA61B663D7A
2019-09-14 13:26:55        333 s3/www.pa-furry.org/2019-09-14-17-26-54-F613777CE621F257
2019-09-14 13:26:56        333 s3/www.pa-furry.org/2019-09-14-17-26-55-99D355F57F3FABA9

At that rate, you will easily wind up with 10s of thousands of logfiles per day. Yikes.

Dealing With So Many Logfiles

Wouldn’t it be nice if there was a way to perform rollup on those files so they could be condensed into fewer bigger files?

Well, I wrote an app for that. Here’s how to get started: first step is that you’re going to need to clone that repo and install Serverless:

git clone git@github.com:dmuth/aws-s3-server-access-logging-rollup.git
npm install -g serverless

Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.

So once you have the code, here’s how to deploy it:

cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
Continue reading “Doing Rollups of AWS S3 Server Access Logs”

Saving and Restoring Your development/ Directory

If you’re like me, you write a fair bit of a code, which means you have to interact with many Git repositories. If you’re also like me, chances are you have them in a directory called development/ or similar. It might even have some nested directories, something like this:

./allaboutcheetahs.info
./diceware
./docker/check-disk-space
./docker/health-check
./node/circuitbreaker-demo
./node/neural-network
./s3/bucket-sizes
./s3/disk-usage
./snowdrift
./ssh-to
Code
Code you may write someday.

So that’s cool, but let’s say that you get a new machine and you want replicate your development/ directory structure onto it? One way is to check out everything by hand, but that’s laborious and time consuming. A second way is to keep backups–and you should absolutely do this–but aside from challenges of restoring a single directory out of an entire archive, what if that backup doesn’t have the latest commits in it?

I can now offer a third way. I recently wrote a couple of scripts available on GitHub that can be used to extract Git remote from each repo in an entire directory stucture, and save those remotes and the directories they belong in to a file. Given the above example, it might look something like this:

./allaboutcheetahs.info git@github.com:dmuth/dmuth.github.io.git
./diceware      git@github.com:dmuth/diceware.git
./docker/check-disk-space       git@github.com:dmuth/docker-check-disk-usage.git
./docker/health-check   git@github.com:dmuth/docker-health-check.git
./node/circuitbreaker-demo      git@github.com:dmuth/another-circuit-breaker.git
./node/neural-network   git@github.com:dmuth/neural-network.git
./s3/bucket-sizes       git@gitlab.com:dmuth/s3-bucket-sizes.git
./s3/disk-usage git@github.com:dmuth/s3-disk-usage.git
./snowdrift     git@github.com:Comcast/snowdrift.git
./ssh-to        git@github.com:Comcast/ssh-to.git
Continue reading “Saving and Restoring Your development/ Directory”

Splunk Lab News and Updates

Hey everyone! I’ve been hard at work on Splunk Lab these last few months, and I wanted to share what I’ve done with it.

Splunk: Knowledge is Power. Power Corrupts. Yield to Temptation.

The first thing is that I baked in several Splunk apps so that they are all available when launching the app! That list includes:

I’ve also written (or, in one case, re-written) apps using Splunk Lab as a jumping off point. Here’s what I have so far:

  • Splunk Yelp Reviews – Lets you pull down Yelp reviews for venues and view visualizations and wordclouds of positive/negative reviews in a Splunk dashboard
  • Splunk Telegram – This app lets you run Splunk against messages from Telegram groups and generate graphs and word clouds based on the activity in them.
  • Splunk Network Health Check – Pings 1 or more hosts and graphs the results in Splunk so you can monitor network connectivity over time.
  • …plus a few other things that I’m not quite ready to release yet. 🙂
Continue reading “Splunk Lab News and Updates”

YouTube Channels I Like

Yeah, pretty much this.

Earlier today I had breakfast with one of my drinking buddies, and he asked me for some YouTube recommendations, because he’s bored with the stuff that he watches. I was going to DM them to him, but thought it would be more sensible for me to make a blog post and share them with others, as well.

I can’t say that this is an exhaustive or authoritative list, rather it’s a list of channels which I have found enjoyable or at least interesting.

The list

  • HowtoBasic – An educational channel which explains how to perform many common tasks. Often with thousands of eggs. 🍳
  • Star Wars Reading Club – This channel features numerous videos, most under 7 minutes in length, which examine a specific plot point or character in the Star Wars universe. Such as the imperial officers who conspired to kill Darth Vader.
  • Eckharts Ladder – A channel which focuses mostly on Stars Wars and Halo, this one focuses mostly on the warfare aspects of the Star Wars universe, such as the different kinds of ships and weapons.
  • DeSinc – Video game speedruns gone horribly sideways.
  • Down The Rabbit Hole – A series of videos by YouTube user Fredrik Knudsen, each video does a deep dive into a specific topic, complete with supporting research and documentation.
  • The Infographics Show – This channel hosts non-fiction videos between 5 and 15 minutes in length which cover some science-related aspects of history.
  • SciShow – Another science-related channel which focuses more on pure science-related topics, with most videos under 10 minutes in length.
  • Pretty Good – I’ll admit, I am not a sports guy. However, I adore the story telling in this series of videos from Jon Bois and SB Nation. He covers everything from the 222-0 blowout in college football to the psychology behind professional gambling to Scorigami in American Football.
  • Epic Rap Battles of History – Ever wanted to see Deadpool vs Bobba Fett? Or Mozart vs Skrillex? Check out ERB.
  • SF Debris – He describes his channel as “serious reviews with silly commentary”. Most of the reviews are on works of science-fiction with an emphasis on the Star Trek franchise. He has even more reviews on his website.

Bonus Channel

Finally, I’ll leave you with The Instant Regret Playlist. This is a playlist of over 2,800 videos, and it is just a thing of beauty. Put it on the TV at your next party.

What do you like to watch on YouTube? Let me know in the comments!

On the Virtue of Laziness in Software Engineering

Many years ago, I recall reading in an O’Relly book which stated that when it comes to programming, “laziness” is considered a virtue.

Be lazy, like this cheetah!

That may seem like a strange thing to utter, but hear me out. When working in software engineering, you will find yourself doing the same thing over and over. It can be tedious and mind-numbing, and if it’s the sort of thing that involves multiple steps, can increase the risk of human error. For example, one case of human error cost a company millions of dollars and ultimately tanked that company.

This means that the more things that are automated or at least semi-automated, the better. There will be less manual steps to run, and less things that can go wrong because a step was missed or not executed properly. Conversely, because automation means the same thing is done over and over, you’ll get repeatable builds which make things like troubleshooting, multi-tenancy, and disaster recovery easier to perform.

Continue reading “On the Virtue of Laziness in Software Engineering”

META: Final Wishes

This beer truck from Denmark could be the very thing that takes me out. Who knows?

I don’t normally like to talk about these sorts of things, but I am not only getting older, I am also mortal. So I just wanted to put out there that if something does happen to me some day, I wrote a blog post expressing my final wishes.

Now back to your regularly scheduled blogging. 🙂

Splunking Yelp Reviews

Awhile ago, I found myself trying to make a decision on which of several restaurants to eat at. They were all highly rated in Yelp, but surely there might be more insights I could pull from their reviews. So I decided to Splunk them!

TL;DR If you want to get straight to the code, go to https://github.com/dmuth/splunk-yelp-reviewsto get started.

Downloading the reviews

“Splunk: See your world. Maybe wish you hadn’t.”

Yelp has an API but, I am sorry to say that it is awful. It will only let you download 3 reviews for any venue. That’s it! What a crime.

So… I had to crawl Yelp venue pages to get reviews. I am not proud of this, but I was left with no other other option.

Python has been my go-to language lately, so I decided to solve the problem of review acquisition with Python. I used the Requests module to fetch the HTML code, and the Beautiful Soup module to extract reviews and page links from the HTML.

Continue reading “Splunking Yelp Reviews”

Using Slack to Monitor RSS Feeds

In a previous post, I talked about using NodePing to send downtime notifications to Slack and having those alerts go to your phone. In this post, I’m going to cover a similar concept: using Slack to track one or more RSS feeds and get alerts when a new item is posted to a feed.

To start with, you’ll want to manage your Slack instance and go to the “Apps” section. If the RSS app doesn’t exist, it should be easy enough to add. Once added, it will look like this:

Click on the RSS app, and you’ll see a list of feeds, which will probably be empty:


If you scroll down, you’ll be able to add a feed, and have alerts go to whichever channel you want. For this example, I’m using the lorem-rss feed, which generates a new item every minute:

Continue reading “Using Slack to Monitor RSS Feeds”

Monitoring RAM Usage on OS/X

I recently noticed that something was using up lots of RAM on my Mac, as it would periodically slow down. I had some suspects, but rather than regularly checking in Activity Monitor, I thought it would be more helpful if I had a way to monitor usage of RAM by various processes over time.

Due to previous success with my Splunk Lab app, I decided to use it as the basis for building out a RAM monitoring app. The data acquisition part, however, was trickier. The output of the UNIX ps app isn’t very structured, and I had some problems parsing that data, especially in situations where there were spaces in filenames and arguments to those commands.

So I wrote a replacement for PS. It turns out that Python has a module called psutil, which lets you programmatically examine the process tree on your Mac. I ended up writing an app called Better PS, and it writes highly structured data on each current process to disk, which is then ingested by Splunk.

Continue reading “Monitoring RAM Usage on OS/X”

The Joy of Using Docker

I’ve written about Docker before, as I am a big fan of it. And for this post, I’m going to talk about some practical situations in which I’ve used Docker in real life, both for testing and software development!

Docker Logo

But first, let’s recap what Docker IS and IS NOT:

  • Docker containers spin up quickly (1-2 seconds or less)
  • Docker containers DO have separated process tress and filesystems
  • Docker containers ARE NOT virtual machines
  • Docker containers ARE intended to be ephemeral. (short-lived)
  • You CAN, however, mount filesystems from the host machine into Docker, so those files can live on after the container shuts down (or is killed).
  • You SHOULD only run one service per Docker container.

Everybody got that? Good. Now, let’s get into some real life things I’ve used Docker for.

Experimenting in Linux

Want to test out some commands or maybe a shell script that you’re worried might be destructive? No worries, try it in a Docker container, and if you nuke the filesystem, there will be no long-term consequences.

#
# Start a container with Alpine Linux
#
$ docker run -it alpine

#
# Let's do something dumb
#
$ rm /bin/ls 
$ ls -l
/bin/sh: ls: not found

#
# Just exit the container, restart it, and our filesystem is back!
#
/ # exit 
[unifi:~/tmp ] $ docker run -it alpine
/ # ls
bin    dev    etc    home   lib    media  mnt    proc   root   run    sbin   srv    sys    tmp    usr    var

And all of the above takes just a couple of seconds! This works with other Linux distros as well, such as CentOS and Unbuntu–just change your Docker command accordingly:

docker run -it centos
docker run -it ubuntu

Yes, that means you could run CentOS in a container under Ubuntu or vice-versa. Docker doesn’t care. 🙂

Continue reading “The Joy of Using Docker”