In a previous post, I wrote about using Splunk to monitor network health and connectivity. While building that project, I thought it would be nice if I could build a more generic application which could be used to perform ad hoc data analysis on pre-existing data without having to go through a complicated process each time I wanted to do some analytics.
So I built Splunk Lab! It is a Dockerized version of Splunk which, when started, will automatically ingest entire directories of logs. Furthermore, if started with the proper configuration, any dashboards or field extractions which are created will persist after the container is terminated, which means they can be used again in the future.
A typical use case for me has been to run this on my webserver to go through my logs on a particularly busy day and see what hosts or pages are generating the most traffic. I’ve also used this when a spambot starts hitting my website for invalid URLs.
So let’s just jump right in with an example:
docker run -p 8000:8000 \ -v $(pwd)/data:/data \ -v /var/logs/nginx/:/logs \ -v $(pwd)/app:/app \ -e SPLUNK_PASSWORD=password \ dmuth1/splunk-lab
This will download the container, start it up, and mount the appropriate directories. The containerized version of Splunk looks recursively for logs in /logs/, stores its data in /data/, and stores dashboards that are created in /app/. (Note that if you try to use “password” as your password, the container will refuse to start for safety reasons!)
First things first, let’s verify our data was loaded and do some field extractions!