Doug's Home On The Web

August 22nd, 2021August 24th, 2021

Stupid UNIX Tricks: Find Videos You Posted To Twitter

I’ve seen complaints pop up on Twitter that people are getting their accounts suspended over years old tweets that happen to contain copyrighted music. So let’s say that, like me, you have a Twitter account over 10 years old and you want to go through your old tweets so you can pull any such video before Twitter does — how do you go about doing that?

Well here’s the thing: the UNIX command line is incredibly powerful if you know how to use it. In this post, I’ll show you how to use the bash shell in Linux or Mac OS/X to find those videos so that you can remove them.

*“UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity.”* — Dennis Ritchie

The first thing you gotta do is download your entire Twitter archive. There are instructions on how to do that here. Once you put in the request, you’ll hear back from Twitter within a day when the download is ready. Expect the file to be rather large — in my case it was over 2 Gigs. Download that file and unzip it.

At the time of this writing, all of your media will be found in the folder data/tweet_media/, so cd into that directory and see how many files there are:

ls -l |wc -l
 10085

April 11th, 2021April 13th, 2021

Getting the Most out of Obsidian

In Part 1, I wrote about how to get your data out of Evernote and into Obsidian. In this post, I’m going to cover how to get the most out of Obsidian in terms of functionality.

Organizing in Obsidian

At a high level, I like to use The PARA Method, which consists of 4 high-level folders for storing your notes. Those folders are:

Projects: Projects are things you are actively researching or working on, such as this blog post. They have deliverables and they have deadlines. Notes should not exist in your project folder forever, but instead be moved into another folder.

Areas of Responsibility: The literal definition that I’ve seen elsewhere is “activity with a standard to be maintained over time”. If you’re using Obsidian for work, it might be for platforms which you own and perform occasional maintenance on, runbooks for dealing with specific issues, etc. If you’re using Obsidian for personal use, it might be for taking notes from books you read, notes about your health, your car, finances, etc.

Resources: A resource is defined as “a topic or theme of ongoing interest”. This might be things like recipes, ideas for home improvement, and the like. I will be the first to admit that sometimes the line blurs between Resources and Areas. The best advice I can offer is to try not to sweat the details here.

Archives: Stuff you’re not using anymore, such as projects you’ve finished or travel plans for trips taken. I recommend ZIPing the folders that hang out in this directory. (Obsidian won’t mind)

March 28th, 2021April 13th, 2021

Migrating Your Notes from Evernote to Obsidian

This post is part 1 of a multi-part series. Part 2 is here.

In this blog post, I’m going to talk about why I moved 3,000 Evernotes from Evernote to Obsidian, and walk through the process of doing so with the help of a third-party open source tool, plus some code I wrote to drive that tool.

Breaking Up With Evernote

I used to love Evernote. I started using it back in 2012 or so, and before long I got the paid version! It met a lot of needs I had at the time, such as being able to take notes using a lightweight interface, letting me search through all of my notes, and letting me attach files and images to them. And I could even do all of those from my iPhone!

But it’s 2021, and things have changed. I don’t believe Evernote has grown as much as it could have as an app. Between the company trying unsuccessfully to launch over products, and the 2018 layoffs, Evernote is not in the best shape as a company.

But the real nail in the coffin for Evernote in my eyes was the “new” release of Evernote a few months ago. Gone was the feature that would let me export an entire note as HTML. Gone was the feature that would let me export all attachments from a note. These were features I had come to rely on when I needed to get to my data, and Evernote took them out of the product! Furthermore, there was one particularly nasty bug that had me fearing for the state of my notes:

That’s right–after the app being open for a few days, clicking on a note–any note–would show a blank screen where my content would be! The first time I saw this, I went into a near panic. I tried restarting Evernote and that fixed the problem for a few days until it came back. I filed a support request with Evernote about this, yet a few versions later this disturbing bug remains.

Introducing: Obsidian

So I started looking around, and I found something I fell in love with: Obsidian.

Obsidian can best be described as “An IDE for Markdown documents“. Don’t know what Markdown is? No problem, it’s very simple syntax used to mark up documents (WAY simpler than HTML) that you can learn in no time at all!

And instead of using a database to store notes, Obsidian uses the filesystem to store them. What this means is that backing up your data is as a simple as zipping up the root folder, which Obsidian calls a “Vault”. And your vault can be sitting in a folder shared to Dropbox, OneDrive, any similar service. This separation of concerns makes Obsidian less complicated because it doesn’t have to sync its own notes, that’s a win!

December 24th, 2020December 26th, 2020

Tarsplit: A Utility to Split Tarballs Into Multiple Parts

Tarsplit is a utility I wrote which can split UNIX tarfiles into multiple parts while keeping files in the tarballs intact. I specifically wrote those because other ways I found of splitting up tarballs didn’t keep the individual files intact, which would not play nice with Docker.

But what does Docker have to do with tar?

While building the Docker images for my Splunk Lab project, I noticed that one of the layers was something like a Gigabyte in size! While Docker can handle large layers, the issue become one of time it takes to push or pull an image. Docker does validation on each layer as it’s received, and the validation of a 1 GB layer took 20-30 wall clock seconds. Not fun.

It occurred to me that if I could split up that layer into say, 10 layers of 100 Megabytes each, then Docker would transfer about 3 layers in parallel, and while a layer is being validated post-transfer, the next layer would start being transferred. The end result is less wall clock seconds to transfer an entire Docker image.

But there was an issue–Splunk and its applications are big. A few of the tarballs are hundreds of Megabytes in size, and I needed a way to split those tarballs into smaller tarballs without corrupting the files in them. This led to Tarsplit being written.

December 21st, 2020December 24th, 2020

Using Eventgen in Splunk Lab

What Is Eventgen?

Because a Docker container is quicker than spinning up a VM.

According to the docs, Eventgen is a Splunk App that lets users built real-time event “generators” so that one-off event generators don’t need to be built.

What does this mean? Let’s say you run a Splunk platform, and you want to create some new dashboards for a data source in production, but want to do this on dev. Without Eventgen, you would need to write a script to generate fake events and write them to a file which is read in by Splunk. That is a lot of work.

And we all have better things to do than write one-off code.

Why Use Eventgen?

With Eventgen, you can create a sample file with say, 1,000 events from that data source, and configure Eventgen to write a random event from that file straight to Splunk via its API, with current timestamps. The end result is that you’ll have a steady stream of realistic events flowing into Splunk with current timestamps, without the need to read from (and rotate) logfiles in the filesystem.

How Eventgen Is Used in Splunk Lab

I took approximately 1,400 lines of logs from my blog’s webserver and included them into Splunk Lab. When Eventgen is used, a random event from that file will be written into Splunk at the rate of approximately once per second. Because the events that make their way into Splunk are random, there will be a short-term fluctuation in the frequency of specific URLs, HTTP verbs, HTTP statuses, etc. This is perfect for creating dashboards that mimic what you might see in a production environment.

October 23rd, 2020June 18th, 2025

How To Start A Furry Convention

So, you want to start a furry convention! Great! We’ll never say no to a new event. And this is a topic I happen to know a little about, as I’ve been staffing/running conventions for 20 years now. (I’m old)

Let’s start with a few questions I’d like to throw out, the sort of questions every new convention organizer should ask themselves…

A Furry Convention Building Checklist:

Have you staffed a con before?
Are you incorporated? If so, as a 501(c)3 or 501(c)7?
Do you have a separate bank account for the organization?
Do you have a budget?
Do you have a hotel or other venue?
Do you have a signed contract with the venue?
Did you read the contract in its entirety?
Do you have liability insurance?
Do you have legal counsel?
Have you lined up people who can be senior staff/department heads for major portions of the convention, such as Programming, A/V, Dealers Room, etc.?
Have you vetted your staff?
Do you have a website?
How about social media?
Is there another con already serving the same general area?
If so, will potential attendees feel that they are forced to choose between the two cons?

May 10th, 2020June 27th, 2020

HOWTO: Safely use “git rebase -i”

Git is a very powerful revision control system used in software development, and at this point, is effectively the industry standard. One of the things that makes Git so powerful is that there are all sorts of low-level operations that developers can do in it. This includes everything from tagging releases, to having an insane number of branches, to painless merging of said branches, to rewriting history.

“I am serious. And don’t call me Shirley.”

If you’re new to Git, your brain probably threw an exception when reaching the end of the paragraph as you thought to yourself, “Wait WHAT? Why would you want to REWRITE HISTORY?” Good question! Normally, you don’t want to mess history in a revision control system. But sometimes you may want to or even need to. A few possible reasons for rewriting history include:

Feature Branch “B” is a branch of Feature Branch “A” and you want to merge B’s changes to master, but not A’s.
Squashing all the commits in a feature branch to a single commit, after having previously pushed those commits.
A developer made a commit with something that doesn’t belong in Git, such as PII or credentials.
You want to “clean up” the commit history a little.

Whether any of the above are fully valid reasons or not is something up to you and your dev team.

That said, rebasing is not something you want to experiment with for the first time on your production repo. It’s just a bad idea. So I built a playground/lab where anyone can experiment with git rebasing in an isolated and local environment, which can be stood up in seconds. Here’s how to get started:

git clone git@github.com:dmuth/git-rebase-i-playground.git
cd git-rebase-i-playground
./init.sh

May 3rd, 2020May 3rd, 2020

Great Palace Map from Zelda II: New Adventure of Link

I’ve been trying to keep myself amused during the COVID-19 shutdown, and as such have been trying a number of new things. One new thing I decided to look into is playing hacked NES ROMs. It turns out there are a great number of ROM hacks out there, with the website ROMhacking.net listing over 4,600 hacked ROMs as of this writing.

While looking into hacked ROMs, I quickly learned that ROM hacks fall into one of 3 main categories: “really good”, “really bad”, and “joke”. I spent some time looking around, and found one particular ROM which I found in the “really good” category: Zelda II: New Adventure of Link, created by someone named HollowShadow. So I downloaded OpenEmu, grabbed the ROM hack, bought a USB NES controller, and got to playing!

Being a big fan of Zelda 2, I had no problem playing through the vast majority of the ROM. It’s when I got to the final palace is when the game became quite difficult, and quickly got lost.

What I needed was a map, but because this was a ROM hack, there wasn’t a whole lot available to me in the way of resources.

So I decided to make a map.

April 29th, 2020May 2nd, 2020

Philadelphia Public Transit vs COVID-19

We’re well past a month into the COVID-19 shutdown, and I noticed that fewer and fewer trains were running on Regional Rail each day. I knew that SEPTA had decreased their service due to less riders, but I wondered just how strict the service cuts were. I also wondered if more or perhaps less trains were running on time.

Fortunately, I have several years worth of train data due to running SeptaStats.com, so I could answer these questions myself!

I started off by firing up Splunk Lab then went for a walk while it took 15 minutes or so to load the data up. I came back, and decided to see where we were:

That’s well north of 60 million data points. While I could crunch that data as-is, it would take longer for me to run my subsequent queries as well as look for trends in the data. I ended up writing a couple of scripts to summarize that data on a daily basis, so that I can get a bucketed breakdown of late trains for the entire day. We’ll get back to that bucketing later, because I first want to talk about train volumes.

I figured that with the perceived dropoff in service, I could look at how many distinct trains (identified by train number) each day. Sure enough, there was a drop off in train service levels:

“Symmetrical book stacking… just like the Philadelphia mass turbulence of 1947!”

Things started to get serious the week of Monday, March 16th. In fact, that was the last day of “normal” service on Regional Rail. Starting on Tuesday the 17th, the number of trains per weekday went from nearly 500 to about 362, with the weekends unaffected.