That’s where Prometheus, Loki, and Grafana all come in. Prometheus is a time series database built for storing metrics. Loki is a log collection system which scales horizontally and is useful for collecting application logs, and Grafana is the dashboard app which is used to view metrics from either platform!
I wanted to learn more about each of these apps, and I figured the best way to do so was to build out something in Docker that let me ingest data immediately, and then to build some sample dashboards on top of that. I then open sourced it, and the entire project can be found at https://github.com/dmuth/grafana-playground
First, clone the repo and start up all of the Docker containers:
git clone https://github.com/dmuth/grafana-playground.git
docker-compose up -d
This will start up several containers, some of which will ingest data, some of which will store data.
Tarsplit is a utility I wrote which can split UNIX tarfiles into multiple parts while keeping files in the tarballs intact. I specifically wrote those because other ways I found of splitting up tarballs didn’t keep the individual files intact, which would not play nice with Docker.
But what does Docker have to do with tar?
While building the Docker images for my Splunk Lab project, I noticed that one of the layers was something like a Gigabyte in size! While Docker can handle large layers, the issue become one of time it takes to push or pull an image. Docker does validation on each layer as it’s received, and the validation of a 1 GB layer took 20-30 wall clock seconds. Not fun.
It occurred to me that if I could split up that layer into say, 10 layers of 100 Megabytes each, then Docker would transfer about 3 layers in parallel, and while a layer is being validated post-transfer, the next layer would start being transferred. The end result is less wall clock seconds to transfer an entire Docker image.
But there was an issue–Splunk and its applications are big. A few of the tarballs are hundreds of Megabytes in size, and I needed a way to split those tarballs into smaller tarballs without corrupting the files in them. This led to Tarsplit being written.
According to the docs, Eventgen is a Splunk App that lets users built real-time event “generators” so that one-off event generators don’t need to be built.
What does this mean? Let’s say you run a Splunk platform, and you want to create some new dashboards for a data source in production, but want to do this on dev. Without Eventgen, you would need to write a script to generate fake events and write them to a file which is read in by Splunk. That is a lot of work.
And we all have better things to do than write one-off code.
Why Use Eventgen?
With Eventgen, you can create a sample file with say, 1,000 events from that data source, and configure Eventgen to write a random event from that file straight to Splunk via its API, with current timestamps. The end result is that you’ll have a steady stream of realistic events flowing into Splunk with current timestamps, without the need to read from (and rotate) logfiles in the filesystem.
How Eventgen Is Used in Splunk Lab
I took approximately 1,400 lines of logs from my blog’s webserver and included them into Splunk Lab. When Eventgen is used, a random event from that file will be written into Splunk at the rate of approximately once per second. Because the events that make their way into Splunk are random, there will be a short-term fluctuation in the frequency of specific URLs, HTTP verbs, HTTP statuses, etc. This is perfect for creating dashboards that mimic what you might see in a production environment.
As much as I love using Docker, one of the frustrations I have is when I try to remove an an image which other images are based on, only to get this error:
$ docker rmi b171179240df
Error response from daemon: conflict: unable to delete b171179240df (cannot be forced) - image has dependent child images
I did some searches on Google, and most of the advice centered around the heavy-handed approach of removing all Docker images and basically starting over with a clean slate. That approach didn’t sit well with me because it doesn’t strike me as all that efficient, and also causes me to have to spend more time waiting for unrelated containers to rebuild.
That prompted me to write a script which, when provided with the ID of a container to remove, will recurse through all child containers and delete them first.
I’ve written about Docker before, as I am a big fan of it. And for this post, I’m going to talk about some practical situations in which I’ve used Docker in real life, both for testing and software development!
But first, let’s recap what Docker IS and IS NOT:
Docker containers DO spin up quickly (1-2 seconds or less)
Docker containers DO have separated process tress and filesystems
Docker containers ARE NOT virtual machines
Docker containers ARE intended to be ephemeral. (short-lived)
You CAN, however, mount filesystems from the host machine into Docker, so those files can live on after the container shuts down (or is killed).
You SHOULD only run one service per Docker container.
Everybody got that? Good. Now, let’s get into some real life things I’ve used Docker for.
Experimenting in Linux
Want to test out some commands or maybe a shell script that you’re worried might be destructive? No worries, try it in a Docker container, and if you nuke the filesystem, there will be no long-term consequences.
# Start a container with Alpine Linux
$ docker run -it alpine
# Let's do something dumb
$ rm /bin/ls
$ ls -l
/bin/sh: ls: not found
# Just exit the container, restart it, and our filesystem is back!
[unifi:~/tmp ] $ docker run -it alpine
$ ls /
bin dev etc home lib media mnt proc root run sbin srv sys tmp usr var
And all of the above takes just a couple of seconds! This works with other Linux distros as well, such as CentOS and Unbuntu–just change your Docker command accordingly:
docker run -it centos
docker run -it ubuntu
Yes, that means you could run CentOS in a container under Ubuntu or vice-versa. Docker doesn’t care. 🙂
In my case, over my Christmas vacation, I checked into a Mom and Pop hotel, or rather a motel! It was about 24 rooms all in a row, occupying a single floor. Since they were on a budget, their Internet offering consisted of what appeared to be 5 or 6 Linksys routers set up every few rooms. You’d simply connect to the closest access point and have Internet.
But there was a problem: determining which access point was closest to me! The signal strength indicator on my computer showed several of them were 3/3 bars so that wasn’t much help. I tried connecting to the first one, but had virtually no Internet connectivity.
Running that command will print up a confirmation screen so that you can back out and change any options (such as hosts to ping), and when you’re ready, just hit <ENTER> to start the container.
In the above example, I added in the TARGETS environment variable, and was sure to include 192.168.1.1, which was the IP for each router (they were all the same). Then I set Splunk “real-time mode” and periodically checked that tab as I was working. This is what I saw:
In a previous post, I wrote about using Splunk to monitor network health and connectivity. While building that project, I thought it would be nice if I could build a more generic application which could be used to perform ad hoc data analysis on pre-existing data without having to go through a complicated process each time I wanted to do some analytics.
So I built Splunk Lab! It is a Dockerized version of Splunk which, when started, will automatically ingest entire directories of logs. Furthermore, if started with the proper configuration, any dashboards or field extractions which are created will persist after the container is terminated, which means they can be used again in the future.
A typical use case for me has been to run this on my webserver to go through my logs on a particularly busy day and see what hosts or pages are generating the most traffic. I’ve also used this when a spambot starts hitting my website for invalid URLs.
This will print a confirmation screen where you can back out to modify options. By default, logs are read from logs/, config files and dashboards are stored in app/, and data that Splunk ingests is written to data/.
Once the container is running, you will be able to access it at https://localhost:8000/ with the username “admin” and the password that you specified at startup.
First things first, let’s verify our data was loaded and do some field extractions!
I’ve been using Splunk professionally over the last several years, and I’ve become a big fan of using it for my data processing needs. Splunk is very very good about ingesting just about any kind of event data, optionally extracting fields at search time, and providing tools to graph that data, find trends, and see what is really happening on your platform. This is important when your platform consists of thousands of servers, as it does at my day job!
While Splunk can handle events in timestamp key=value key2=value2 format, it also has support for dozens of standardized formats such as syslog, Apache logs, etc. If your data is in a customized format, no problem! Splunk can extract that data at either index or search time! Finally, there’s the Search Processing Language, which is like SQL but for your event data. With SPL, you can run queries, generate graphs, and combine them all programatically.
So yeah, I’m a huge fan of Splunk. One thing I use it for out of the of office is to graph the health of my Internet connection. This is useful both for when I’m at home and when I am traveling–I just feed the output of ping into Splunk and then I can get graphs of packet loss and network latency.
Let’s just jump into an example screen–here’s what I saw when I was a friend’s place and I uploaded a video to YouTube:
TL;DR If you are comfortable with Docker and Docker Compose, you can go straight to the GitHub repo and get started. For the everyone else, read on…
When I stood up this website, I wanted to do so in Docker, but I ran into an issue: the official WordPress Docker image runs Apache. Apache is a nice webserver for small amounts of traffic, but it does not scale well. As more concurrent connections come into a server running Apache, more copies of the httpd process are forked, which causes RAM usage to go up. Having RAM usage regularly go up and down is not ideal.
Fortunately, there is a better way. The Nginx webserver, combined with PHP running in FPM mode scales much better as the memory usage is more constant, which means that peak loads on the server won’t cause you to thrash the swapfile. Encryption would also be nice, so I wanted to have some SSL going as well.
I couldn’t find any existing solutions, so I built one! In this post, I’m going to walk through each piece of the puzzle.