Using Splunk to Monitor Network Health

Splunk> Winning the War on Error

I’ve been using Splunk professionally over the last several years, and I’ve become a big fan of using it for my data processing needs. Splunk is very very good about ingesting just about any kind of event data, optionally extracting fields at search time, and providing tools to graph that data, find trends, and see what is really happening on your platform. This is important when your platform consists of thousands of servers, as it does at my day job!

While Splunk can handle events in timestamp key=value key2=value2 format, it also has support for dozens of standardized formats such as syslog, Apache logs, etc. If your data is in a customized format, no problem! Splunk can extract that data at either index or search time! Finally, there’s the Search Processing Language, which is like SQL but for your event data. With SPL, you can run queries, generate graphs, and combine them all programatically.

So yeah, I’m a huge fan of Splunk. One thing I use it for out of the of office is to graph the health of my Internet connection. This is useful both for when I’m at home and when I am traveling–I just feed the output of ping into Splunk and then I can get graphs of packet loss and network latency.

Let’s just jump into an example screen–here’s what I saw when I was a friend’s place and I uploaded a video to YouTube:

I was on their wireless network and wanted to make sure that my upload didn’t cause any network issues. So I pulled up the dashboard after I was done and was able to confirm that while I caused some latency, the packet loss was minimal.

How about another screen:

Taken from a mobile hotspot on an Amtrak train going through a tunnel on the way to Baltimore station

The above dashboard was taken while I was on Amtrak–I was on my personal hotspot and we went through a tunnel which caused a complete loss of Internet connectivity for a couple of minutes. I watched this one unfold in realtime by putting Splunk on “real-time” mode, and was able to just sit back and wait until Internet connectivity came back–I didn’t have to keep trying to refresh a webpage or anything like that

Installing the App

First, make sure you have Docker installed on your machine. Once you do, you can stand up a container running this project with this command:

SPLUNK_START_ARGS=--accept-license \
   bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-network-health-check/master/go.sh)

Running that command will print up a confirmation screen so that you can back out and change any options (such as hosts to ping), and when you’re ready, just hit <ENTER> to start the container.

Then, go to http://localhost:8000/ and log in with admin/password to view the dashboard. By default, google.com, 8.8.8.8 (Google’s DNS resolver), and 1.1.1.1 (CloudFlare’s DNS resolver) will be pinged.

To add additional hosts or have a different admin password, it is just a matter of setting the right environment variables. Documentation of that, and the the underlying Dockerfiles and scripts can be found on the web at:

https://github.com/dmuth/splunk-network-health-check

Enjoy!