Welcome to my website and blog! It has been been around in one form or another for well over a decade. It’s changed purposes a few times, and at this point is now a mostly tech blog where I share things I’ve learned or made. Feel free to have a look around, or consider checking out some of the more popular posts I’ve made over the years:
Tarsplit is a utility I wrote which can split UNIX tarfiles into multiple parts while keeping files in the tarballs intact. I specifically wrote those because other ways I found of splitting up tarballs didn’t keep the individual files intact, which would not play nice with Docker.
But what does Docker have to do with tar?
While building the Docker images for my Splunk Lab project, I noticed that one of the layers was something like a Gigabyte in size! While Docker can handle large layers, the issue become one of time it takes to push or pull an image. Docker does validation on each layer as it’s received, and the validation of a 1 GB layer took 20-30 wall clock seconds. Not fun.
It occurred to me that if I could split up that layer into say, 10 layers of 100 Megabytes each, then Docker would transfer about 3 layers in parallel, and while a layer is being validated post-transfer, the next layer would start being transferred. The end result is less wall clock seconds to transfer an entire Docker image.
But there was an issue–Splunk and its applications are big. A few of the tarballs are hundreds of Megabytes in size, and I needed a way to split those tarballs into smaller tarballs without corrupting the files in them. This led to Tarsplit being written.
According to the docs, Eventgen is a Splunk App that lets users built real-time event “generators” so that one-off event generators don’t need to be built.
What does this mean? Let’s say you run a Splunk platform, and you want to create some new dashboards for a data source in production, but want to do this on dev. Without Eventgen, you would need to write a script to generate fake events and write them to a file which is read in by Splunk. That is a lot of work.
And we all have better things to do than write one-off code.
Why Use Eventgen?
With Eventgen, you can create a sample file with say, 1,000 events from that data source, and configure Eventgen to write a random event from that file straight to Splunk via its API, with current timestamps. The end result is that you’ll have a steady stream of realistic events flowing into Splunk with current timestamps, without the need to read from (and rotate) logfiles in the filesystem.
How Eventgen Is Used in Splunk Lab
I took approximately 1,400 lines of logs from my blog’s webserver and included them into Splunk Lab. When Eventgen is used, a random event from that file will be written into Splunk at the rate of approximately once per second. Because the events that make their way into Splunk are random, there will be a short-term fluctuation in the frequency of specific URLs, HTTP verbs, HTTP statuses, etc. This is perfect for creating dashboards that mimic what you might see in a production environment.
So, you want to start a furry convention! Great! We’ll never say no to a new event. And this is a topic I happen to know a little about, as I’ve been staffing/running conventions for 20 years now. (I’m old)
Let’s start with a few questions I’d like to throw out, the sort of questions every new convention organizer should ask themselves…
A Furry Convention Building Checklist:
Have you staffed a con before?
Are you incorporated? If so, as a 501(c)3 or 501(c)7?
Do you have a separate bank account for the organization?
Do you have a budget?
Do you have a hotel or other venue?
Do you have a signed contract with the venue?
Did you read the contract in its entirety?
Do you have liability insurance?
Do you have legal counsel?
Have you lined up people who can be senior staff/department heads for major portions of the convention, such as Programming, A/V, Dealers Room, etc.?
Have you vetted your staff?
Do you have a website?
How about social media?
Is there another con already serving the same general area?
If so, will potential attendees feel that they are forced to choose between the two cons?
Git is a very powerful revision control system used in software development, and at this point, is effectively the industry standard. One of the things that makes Git so powerful is that there are all sorts of low-level operations that developers can do in it. This includes everything from tagging releases, to having an insane number of branches, to painless merging of said branches, to rewriting history.
If you’re new to Git, your brain probably threw an exception when reaching the end of the paragraph as you thought to yourself, “Wait WHAT? Why would you want to REWRITE HISTORY?” Good question! Normally, you don’t want to mess history in a revision control system. But sometimes you may want to or even need to. A few possible reasons for rewriting history include:
Feature Branch “B” is a branch of Feature Branch “A” and you want to merge B’s changes to master, but not A’s.
Squashing all the commits in a feature branch to a single commit, after having previously pushed those commits.
A developer made a commit with something that doesn’t belong in Git, such as PII or credentials.
You want to “clean up” the commit history a little.
Whether any of the above are fully valid reasons or not is something up to you and your dev team.
That said, rebasing is not something you want to experiment with for the first time on your production repo. It’s just a bad idea. So I built a playground/lab where anyone can experiment with git rebasing in an isolated and local environment, which can be stood up in seconds. Here’s how to get started:
git clone firstname.lastname@example.org:dmuth/git-rebase-i-playground.git
I’ve been trying to keep myself amused during the COVID-19 shutdown, and as such have been trying a number of new things. One new thing I decided to look into is playing hacked NES ROMs. It turns out there are a great number of ROM hacks out there, with the website ROMhacking.net listing over 4,600 hacked ROMs as of this writing.
While looking into hacked ROMs, I quickly learned that ROM hacks fall into one of 3 main categories: “really good”, “really bad”, and “joke”. I spent some time looking around, and found one particular ROM which I found in the “really good” category: Zelda II: New Adventure of Link, created by someone named HollowShadow. So I downloaded OpenEmu, grabbed the ROM hack, bought a USB NES controller, and got to playing!
Being a big fan of Zelda 2, I had no problem playing through the vast majority of the ROM. It’s when I got to the final palace is when the game became quite difficult, and quickly got lost.
What I needed was a map, but because this was a ROM hack, there wasn’t a whole lot available to me in the way of resources.
We’re well past a month into the COVID-19 shutdown, and I noticed that fewer and fewer trains were running on Regional Rail each day. I knew that SEPTA had decreased their service due to less riders, but I wondered just how strict the service cuts were. I also wondered if more or perhaps less trains were running on time.
I started off by firing up Splunk Lab then went for a walk while it took 15 minutes or so to load the data up. I came back, and decided to see where we were:
That’s well north of 60 million data points. While I could crunch that data as-is, it would take longer for me to run my subsequent queries as well as look for trends in the data. I ended up writing a couple of scripts to summarize that data on a daily basis, so that I can get a bucketed breakdown of late trains for the entire day. We’ll get back to that bucketing later, because I first want to talk about train volumes.
I figured that with the perceived dropoff in service, I could look at how many distinct trains (identified by train number) each day. Sure enough, there was a drop off in train service levels:
Things started to get serious the week of Monday, March 16th. In fact, that was the last day of “normal” service on Regional Rail. Starting on Tuesday the 17th, the number of trains per weekday went from nearly 500 to about 362, with the weekends unaffected.
With this whole COVID-19 pandemic going on, we all have to socially isolate ourselves in our homes in order to flatten the curve of cases, and this is likely to continue for some time.
I’ve looked into ways to watch Netflix with friends online, and after trying several different “solutions”, I found one which works quite well: Netflix Party
In order to use Netflix Party, each person will need a desktop or laptop running the Google Chrome web browser. If you don’t have Chrome installed, ask your child to install it on your computer for you. If you don’t have any children, text your neighbor’s kid for installation instructions.
If you manage Linux servers over the Internet, you use SSH to connect to them. SSH lets you have a remote shell on a host over an encrypted channel so that an attacker cannot watch what you are doing over the network. In this blog post, I’m going to talk about using SSH at scale across thousands of posts.
Phase 0: Passwords
When you get started with SSH for the first time, you likely won’t have keys set up and will instead use passwords to authenticate to your servers. It will look something like this:
You use SSH to connect to the server, type in your password, and you’re good to go. That’s fine for small scale, such as managing a single server, but it doesn’t come without downsides. Specifically, you won’t be able to easily use a tool such as Ansible nor do code checkins with Git.
And that’s actually a bigger problem than it sounds, because if you make it harder to use a tool, that tool will be used far less often. This can lead to things such as configuration drift due to Ansible being run less often, or giant code pushes happening once a day if Git is being run less. And giant code pushes are a particular problem, because if other engineers have written code, you’ll have to do a merge, and if a bug presents itself, you’ll now have to think back to what you did 8 hours ago, not 8 minutes ago. Having to type in a password every single time will also slow down the rate of deployment, which in turn slows down the rate of product releases. Not good.
Seriously, don’t use SSH with a password for any reason other than as a stepping step to using keys. And that brings us to…
Perhaps you’re worried about being doxxed, perhaps you’ve received some specific threats, maybe you just want to increase your security. No matter the reason, this article is for you! Below I will list a collection of good practices to keep you and your accounts safe online. I fully expect to update this post as things change in the future.
I have tried to put things in a logical order, with some later steps depending on earlier steps, and some things that may be considered “controversial” towards the end.
This post was last updated on Jan 2, 2020.
Let’s start with passwords. I shouldn’t have to say this, but I will do so anyway: do not reuse passwords. Reusing passwords mean that if a single account provider is breached and your plaintext password is recovered, you now have additional accounts at risk of compromise. This has happened before.
I recommend using a password manager such as LastPass to keep track of your passwords. While having your passwords stored in an app that uploads them somewhere increases your risk slightly, I feel it is outweighed by using a different password for each service. For passwords themselves, you can use random characters or a system such as Diceware to create long passwords that are easier to remember. While the latter is slightly less secure, a password that can be remembered is one less password to store into a password manager.