Welcome to my website and blog! It has been been around in one form or another for well over a decade. It’s changed purposes a few times, and at this point is now a mostly tech blog where I share things I’ve learned or made. Feel free to have a look around, or consider checking out some of the more popular posts I’ve made over the years:
Living 20 minutes from downtown Philadelphia, I’m a big fan of our hockey mascot, Gritty. Recently I’ve been playing around with a website called character.ai, and one of the neat things that site lets you do is create bots based on characters, real or imaginary. For example, there is one character based on Albert Einstein, and another is based on Darth Vader. So I decided I would create a character based on Gritty.
I immediately regretted that.
The AI powering that site is… frightfully good, to say the least. After seeding the character with just a handful of tweets from Gritty’s Twitter feed, the bot quickly took on a life of its own and said things that I would absolutely expect the real Gritty to say.
For example, let’s start with the no-fly list:
Next I asked Gritty about his diet, and the answers the bot gave were concerning, to say the least:
I write a lot of back end code and sometimes have to make use of HTTP endpoints. Sometimes I want to test those endpoints. I used to use httpbin.org for my endpoint testing, but over time noticed that some of the endpoints were returning HTTP 5xx errors. Some investigation reveals that the project seems to be abandoned, with open issues going all the way back to 2017. That’s not so good.
In this post I’d like to talk about FastAPI Httpbin. But before I can, I need to talk about FastAPI itself. FastAPI is a framework for building high-performance frameworks in Python based on Python type hints. The really neat thing about FastAPI is that your function definition for each endpoint is the source of truth–FastAPI handles argument validation for any calls to that endpoint, and generates the appropriate Swagger documentation for that endpoint.
If you’re been reading this blog for awhile, you’ll know that I’m a big fan of Splunk, and I even went so far as to Dockerize it for use in a lab/testing environment.
Well today I want to talk about a command in Splunk which I believe is seriously underrated: makeresults.
Makeresults (documented here) lets you generate fake events for testing purposes. No indexes are queried, no disks are touched, which means that makes results is very very fast. And when a query runs quickly, that means you can run it more times which means new queries and content will be developed faster.
In this post, I’m going to walk you through a way to use makeresults to learn the difference between the streamstats and eventstats commands.
Google Drive is one of my favorite apps for storing and editing documents and spreadsheets. If don’t currently use Google Drive in place of Microsoft Office, I would recommend checking it out!
That said, while it’s a useful tool, your files are being stored on somebody else’s computer, which means that if your Google account should get hacked or suspended, you will lose access to your files. Not good.
In this post, I will show you how to back up the contents of your Google Drive onto your filesystem. You will need a medium level of knowledge and some experience with the command line for this.
Installing and Configuring Rclone
First, start by downloading Rclone. Rclone is a command line app for managing, copying, and syncing files across over 40 different cloud providers. In addition to Google Drive, it has support for Dropbox, AWS S3, Microsoft OneDrive, and a whole list of cloud providers that I’ve never even heard of!
Once you have Rclone downloaded, start up its configuration wizard by typing:
That’s where Prometheus, Loki, and Grafana all come in. Prometheus is a time series database built for storing metrics. Loki is a log collection system which scales horizontally and is useful for collecting application logs, and Grafana is the dashboard app which is used to view metrics from either platform!
I wanted to learn more about each of these apps, and I figured the best way to do so was to build out something in Docker that let me ingest data immediately, and then to build some sample dashboards on top of that. I then open sourced it, and the entire project can be found at https://github.com/dmuth/grafana-playground
First, clone the repo and start up all of the Docker containers:
git clone https://github.com/dmuth/grafana-playground.git
docker-compose up -d
This will start up several containers, some of which will ingest data, some of which will store data.
I’ve seen complaints pop up on Twitter that people are getting their accounts suspended over years old tweets that happen to contain copyrighted music. So let’s say that, like me, you have a Twitter account over 10 years old and you want to go through your old tweets so you can pull any such video before Twitter does — how do you go about doing that?
Well here’s the thing: the UNIX command line is incredibly powerful if you know how to use it. In this post, I’ll show you how to use the bash shell in Linux or Mac OS/X to find those videos so that you can remove them.
The first thing you gotta do is download your entire Twitter archive. There are instructions on how to do that here. Once you put in the request, you’ll hear back from Twitter within a day when the download is ready. Expect the file to be rather large — in my case it was over 2 Gigs. Download that file and unzip it.
At the time of this writing, all of your media will be found in the folder data/tweet_media/, so cd into that directory and see how many files there are:
In Part 1, I wrote about how to get your data out of Evernote and into Obsidian. In this post, I’m going to cover how to get the most out of Obsidian in terms of functionality.
Organizing in Obsidian
At a high level, I like to use The PARA Method, which consists of 4 high-level folders for storing your notes. Those folders are:
Projects: Projects are things you are actively researching or working on, such as this blog post. They have deliverables and they have deadlines. Notes should not exist in your project folder forever, but instead be moved into another folder.
Areas of Responsibility: The literal definition that I’ve seen elsewhere is “activity with a standard to be maintained over time”. If you’re using Obsidian for work, it might be for platforms which you own and perform occasional maintenance on, runbooks for dealing with specific issues, etc. If you’re using Obsidian for personal use, it might be for taking notes from books you read, notes about your health, your car, finances, etc.
Resources: A resource is defined as “a topic or theme of ongoing interest”. This might be things like recipes, ideas for home improvement, and the like. I will be the first to admit that sometimes the line blurs between Resources and Areas. The best advice I can offer is to try not to sweat the details here.
Archives: Stuff you’re not using anymore, such as projects you’ve finished or travel plans for trips taken. I recommend ZIPing the folders that hang out in this directory. (Obsidian won’t mind)
In this blog post, I’m going to talk about why I moved 3,000 Evernotes from Evernote to Obsidian, and walk through the process of doing so with the help of a third-party open source tool, plus some code I wrote to drive that tool.
Breaking Up With Evernote
I used to love Evernote. I started using it back in 2012 or so, and before long I got the paid version! It met a lot of needs I had at the time, such as being able to take notes using a lightweight interface, letting me search through all of my notes, and letting me attach files and images to them. And I could even do all of those from my iPhone!
But it’s 2021, and things have changed. I don’t believe Evernote has grown as much as it could have as an app. Between the company trying unsuccessfully to launch over products, and the 2018 layoffs, Evernote is not in the best shape as a company.
But the real nail in the coffin for Evernote in my eyes was the “new” release of Evernote a few months ago. Gone was the feature that would let me export an entire note as HTML. Gone was the feature that would let me export all attachments from a note. These were features I had come to rely on when I needed to get to my data, and Evernote took them out of the product! Furthermore, there was one particularly nasty bug that had me fearing for the state of my notes:
That’s right–after the app being open for a few days, clicking on a note–any note–would show a blank screen where my content would be! The first time I saw this, I went into a near panic. I tried restarting Evernote and that fixed the problem for a few days until it came back. I filed a support request with Evernote about this, yet a few versions later this disturbing bug remains.
So I started looking around, and I found something I fell in love with: Obsidian.
Obsidian can best be described as “An IDE for Markdown documents“. Don’t know what Markdown is? No problem, it’s very simple syntax used to mark up documents (WAY simpler than HTML) that you can learn in no time at all!
And instead of using a database to store notes, Obsidian uses the filesystem to store them. What this means is that backing up your data is as a simple as zipping up the root folder, which Obsidian calls a “Vault”. And your vault can be sitting in a folder shared to Dropbox, OneDrive, any similar service. This separation of concerns makes Obsidian less complicated because it doesn’t have to sync its own notes, that’s a win!
Tarsplit is a utility I wrote which can split UNIX tarfiles into multiple parts while keeping files in the tarballs intact. I specifically wrote those because other ways I found of splitting up tarballs didn’t keep the individual files intact, which would not play nice with Docker.
But what does Docker have to do with tar?
While building the Docker images for my Splunk Lab project, I noticed that one of the layers was something like a Gigabyte in size! While Docker can handle large layers, the issue become one of time it takes to push or pull an image. Docker does validation on each layer as it’s received, and the validation of a 1 GB layer took 20-30 wall clock seconds. Not fun.
It occurred to me that if I could split up that layer into say, 10 layers of 100 Megabytes each, then Docker would transfer about 3 layers in parallel, and while a layer is being validated post-transfer, the next layer would start being transferred. The end result is less wall clock seconds to transfer an entire Docker image.
But there was an issue–Splunk and its applications are big. A few of the tarballs are hundreds of Megabytes in size, and I needed a way to split those tarballs into smaller tarballs without corrupting the files in them. This led to Tarsplit being written.
According to the docs, Eventgen is a Splunk App that lets users built real-time event “generators” so that one-off event generators don’t need to be built.
What does this mean? Let’s say you run a Splunk platform, and you want to create some new dashboards for a data source in production, but want to do this on dev. Without Eventgen, you would need to write a script to generate fake events and write them to a file which is read in by Splunk. That is a lot of work.
And we all have better things to do than write one-off code.
Why Use Eventgen?
With Eventgen, you can create a sample file with say, 1,000 events from that data source, and configure Eventgen to write a random event from that file straight to Splunk via its API, with current timestamps. The end result is that you’ll have a steady stream of realistic events flowing into Splunk with current timestamps, without the need to read from (and rotate) logfiles in the filesystem.
How Eventgen Is Used in Splunk Lab
I took approximately 1,400 lines of logs from my blog’s webserver and included them into Splunk Lab. When Eventgen is used, a random event from that file will be written into Splunk at the rate of approximately once per second. Because the events that make their way into Splunk are random, there will be a short-term fluctuation in the frequency of specific URLs, HTTP verbs, HTTP statuses, etc. This is perfect for creating dashboards that mimic what you might see in a production environment.