If you manage Linux servers over the Internet, you use SSH to connect to them. SSH lets you have a remote shell on a host over an encrypted channel so that an attacker cannot watch what you are doing over the network. In this blog post, I’m going to talk about using SSH at scale across thousands of posts.
Phase 0: Passwords
When you get started with SSH for the first time, you likely won’t have keys set up and will instead use passwords to authenticate to your servers. It will look something like this:
You use SSH to connect to the server, type in your password, and you’re good to go. That’s fine for small scale, such as managing a single server, but it doesn’t come without downsides. Specifically, you won’t be able to easily use a tool such as Ansible nor do code checkins with Git.
And that’s actually a bigger problem than it sounds, because if you make it harder to use a tool, that tool will be used far less often. This can lead to things such as configuration drift due to Ansible being run less often, or giant code pushes happening once a day if Git is being run less. And giant code pushes are a particular problem, because if other engineers have written code, you’ll have to do a merge, and if a bug presents itself, you’ll now have to think back to what you did 8 hours ago, not 8 minutes ago. Having to type in a password every single time will also slow down the rate of deployment, which in turn slows down the rate of product releases. Not good.
Seriously, don’t use SSH with a password for any reason other than as a stepping step to using keys. And that brings us to…
Perhaps you’re worried about being doxxed, perhaps you’ve received some specific threats, maybe you just want to increase your security. No matter the reason, this article is for you! Below I will list a collection of good practices to keep you and your accounts safe online. I fully expect to update this post as things change in the future.
I have tried to put things in a logical order, with some later steps depending on earlier steps, and some things that may be considered “controversial” towards the end.
This post was last updated on Jan 2, 2020.
Let’s start with passwords. I shouldn’t have to say this, but I will do so anyway: do not reuse passwords. Reusing passwords mean that if a single account provider is breached and your plaintext password is recovered, you now have additional accounts at risk of compromise. This has happened before.
I recommend using a password manager such as LastPass to keep track of your passwords. While having your passwords stored in an app that uploads them somewhere increases your risk slightly, I feel it is outweighed by using a different password for each service. For passwords themselves, you can use random characters or a system such as Diceware to create long passwords that are easier to remember. While the latter is slightly less secure, a password that can be remembered is one less password to store into a password manager.
As much as I love using Docker, one of the frustrations I have is when I try to remove an an image which other images are based on, only to get this error:
$ docker rmi b171179240df
Error response from daemon: conflict: unable to delete b171179240df (cannot be forced) - image has dependent child images
I did some searches on Google, and most of the advice centered around the heavy-handed approach of removing all Docker images and basically starting over with a clean slate. That approach didn’t sit well with me because it doesn’t strike me as all that efficient, and also causes me to have to spend more time waiting for unrelated containers to rebuild.
That prompted me to write a script which, when provided with the ID of a container to remove, will recurse through all child containers and delete them first.
If you are storing files in Amazon S3, you absolutely positively should enable AWS S3 Access Logging. This will cause every single access in a bucket to be written to a logfile in another S3 bucket, and is super useful for tracking down bucket usage, especially if you have any publicly hosted content in your buckets.
But there’s a problem–AWS goes absolutely bonkers when it comes to writing logs in S3. Multiple files will be written per minute,each with as few as one event in them. It comes out looking like this:
Serverless is an app which lets you deploy applications on AWS and other cloud providers without actually spinning up virtual servers. In our case, we’ll use Serverless to create a Lambda function which executes periodically and performs rollup of logfiles.
So once you have the code, here’s how to deploy it:
cp serverless.yml.exmaple serverless.xml
vim serverless.xml # Vim is Best Editor
serverless deploy # Deploy the app. This will take some time.
If you’re like me, you write a fair bit of a code, which means you have to interact with many Git repositories. If you’re also like me, chances are you have them in a directory called development/ or similar. It might even have some nested directories, something like this:
So that’s cool, but let’s say that you get a new machine and you want replicate your development/ directory structure onto it? One way is to check out everything by hand, but that’s laborious and time consuming. A second way is to keep backups–and you should absolutely do this–but aside from challenges of restoring a single directory out of an entire archive, what if that backup doesn’t have the latest commits in it?
I can now offer a third way. I recently wrote a couple of scripts available on GitHub that can be used to extract Git remote from each repo in an entire directory stucture, and save those remotes and the directories they belong in to a file. Given the above example, it might look something like this:
Earlier today I had breakfast with one of my drinking buddies, and he asked me for some YouTube recommendations, because he’s bored with the stuff that he watches. I was going to DM them to him, but thought it would be more sensible for me to make a blog post and share them with others, as well.
I can’t say that this is an exhaustive or authoritative list, rather it’s a list of channels which I have found enjoyable or at least interesting.
HowtoBasic – An educational channel which explains how to perform many common tasks. Often with thousands of eggs. 🍳
Eckharts Ladder – A channel which focuses mostly on Stars Wars and Halo, this one focuses mostly on the warfare aspects of the Star Wars universe, such as the different kinds of ships and weapons.
DeSinc – Video game speedruns gone horribly sideways.
Down The Rabbit Hole – A series of videos by YouTube user Fredrik Knudsen, each video does a deep dive into a specific topic, complete with supporting research and documentation.
The Infographics Show – This channel hosts non-fiction videos between 5 and 15 minutes in length which cover some science-related aspects of history.
SciShow – Another science-related channel which focuses more on pure science-related topics, with most videos under 10 minutes in length.
Pretty Good – I’ll admit, I am not a sports guy. However, I adore the story telling in this series of videos from Jon Bois and SB Nation. He covers everything from the 222-0 blowout in college football to the psychology behind professional gambling to Scorigami in American Football.
SF Debris – He describes his channel as “serious reviews with silly commentary”. Most of the reviews are on works of science-fiction with an emphasis on the Star Trek franchise. He has even more reviews on his website.
Finally, I’ll leave you with The Instant Regret Playlist. This is a playlist of over 2,800 videos, and it is just a thing of beauty. Put it on the TV at your next party.
What do you like to watch on YouTube? Let me know in the comments!
That may seem like a strange thing to utter, but hear me out. When working in software engineering, you will find yourself doing the same thing over and over. It can be tedious and mind-numbing, and if it’s the sort of thing that involves multiple steps, can increase the risk of human error. For example, one case of human error cost a company millions of dollars and ultimately tanked that company.
This means that the more things that are automated or at least semi-automated, the better. There will be less manual steps to run, and less things that can go wrong because a step was missed or not executed properly. Conversely, because automation means the same thing is done over and over, you’ll get repeatable builds which make things like troubleshooting, multi-tenancy, and disaster recovery easier to perform.
I don’t normally like to talk about these sorts of things, but I am not only getting older, I am also mortal. So I just wanted to put out there that if something does happen to me some day, I wrote a blog post expressing my final wishes.
Awhile ago, I found myself trying to make a decision on which of several restaurants to eat at. They were all highly rated in Yelp, but surely there might be more insights I could pull from their reviews. So I decided to Splunk them!
Yelp has an API but, I am sorry to say that it is awful. It will only let you download 3 reviews for any venue. That’s it! What a crime.
So… I had to crawl Yelp venue pages to get reviews. I am not proud of this, but I was left with no other other option.
Python has been my go-to language lately, so I decided to solve the problem of review acquisition with Python. I used the Requests module to fetch the HTML code, and the Beautiful Soup module to extract reviews and page links from the HTML.