Using Splunk to Monitor Network Health

Splunk> Winning the War on Error

I’ve been using Splunk professionally over the last several years, and I’ve become a big fan of using it for my data processing needs. Splunk is very very good about ingesting just about any kind of event data, optionally extracting fields at search time, and providing tools to graph that data, find trends, and see what is really happening on your platform. This is important when your platform consists of thousands of servers, as it does at my day job!

While Splunk can handle events in timestamp key=value key2=value2 format, it also has support for dozens of standardized formats such as syslog, Apache logs, etc. If your data is in a customized format, no problem! Splunk can extract that data at either index or search time! Finally, there’s the Search Processing Language, which is like SQL but for your event data. With SPL, you can run queries, generate graphs, and combine them all programatically.

So yeah, I’m a huge fan of Splunk. One thing I use it for out of the of office is to graph the health of my Internet connection. This is useful both for when I’m at home and when I am traveling–I just feed the output of ping into Splunk and then I can get graphs of packet loss and network latency.

Let’s just jump into an example screen–here’s what I saw when I was a friend’s place and I uploaded a video to YouTube:

Continue reading “Using Splunk to Monitor Network Health”

Examining 12 Months of SEPTA API Queries

With the release of SEPTA’s new app, I’ve suddenly been flooded with questions about their API. People wanted to know how stable it was.

Well, I don’t work for SEPTA, which means I don’t have insight into their operations, but I can perform some analytics based on what I have, which is approximately 18 months of Regional Rail train data, read every minute by SEPTA Stats.

Overall Stats

This is all of the data that I have in Septa Stats currently:

  • Events Since Inception: 26,924,887 events
  • First Event: Mar 1, 2016 12:00:01 AM
  • Last Event: Nov 16, 2017 10:33:53 PM

That’s way more events than minutes in that timeframe, and the reason for that is each API query is split into a separate event for each train. So if an API call returns status for 20 trains, that gets split into 20 different events. This is done because Splunk has a much easier time working with JSON that isn’t a giant array. 🙂

Continue reading “Examining 12 Months of SEPTA API Queries”

Introducing the SEPTA Regional Rail System Dashboard!

…and 2 new API endpionts, too. But more on those later.

I’m proud to say that there is now a dashboard for the entire Regional Rail system. It is present on both the front page and the “SEPTA System Stats” page:

This new dashboard makes it straightforward to determine the status of the entire Regional Rail system at a glance.

Continue reading “Introducing the SEPTA Regional Rail System Dashboard!”