What’s Taking Up So Much Space in AWS S3?

I’m a big fan of Amazon S3 for storage, and I like it so much that I use Odrive to sync folders from my hard drive into S3 use S3 to store copies of all of my files from Dropbox as a form of backup.  I only have about 20 GB of data that I truly care about, so that should be less than a dollar per month for hosting, right? Well…

“You are not your job or how much data you have in S3!”

Close to 250 GB billed for last month. How did that happen?

Continue reading “What’s Taking Up So Much Space in AWS S3?”

Dockerizing Discord’s Music Bot in Amazon ECS

I’m a big fan of the Discord Musicbot, and run it on some Discord servers that I admin. Wanting to run it on a server, I first created an Ansible playbook and launched a server on Digital Ocean. But after a few months, I noticed that the server was sitting over 90% idle. Surely there had to be a better way.

So I next tried Docker, and created a Dockerized version of the Musicbot. I was quite happy with how much easier it was to spin up the bot, but still didn’t want to run it on a dedicated server on Digital Ocean. Aside from having unused capacity, if that machine were to go down, I’d have to do something manually.

I thought about running it in some sort of hosted Docker cluster, and came across Amazon’s container service. So this post is about creating your own cluster in ECS and hosting a Docker container in it. I found the process slightly confusing when I did it the first time, and wanted to share my experience here.

Continue reading “Dockerizing Discord’s Music Bot in Amazon ECS”

How to Undelete Files in Amazon S3

While S3 is a great storage platform, what happens if you accidentally delete some important files? Well, S3 has a mechanism to recover deleted files, and I’d like to go into that in this post.

First, make sure you have versioning enabled on your bucket. This can be done via the API, or via the UI in the “properties” tab for your bucket. Versioning saves every change to a file (including deletions) as a separate version of said object, with the most recent version taking precedence. In fact, a deletion is also a version! It is a zero-byte version which has a “DELETE” flag set. And the essence of recovering undeleted files simply involves removing the latest version with the “DELETE” flag.

This is what that would look like in the UI:

To undelete these files, we’ll use a script I created called s3-undelete.sh, which can be found over on GitHub:

Continue reading “How to Undelete Files in Amazon S3”

ssh-to: Easily manage dozens or hundreds of machines with SSH

Hey software engineers! Do you manage servers? Lots of servers? Hate copying and pasting hostnames and IP addresses? Need a way to execute a command on each of a group of servers that you manage?

I developed an app which can help with those things, and my employer has graciously given me permission to open source it.

First, here’s the link:

https://github.com/comcast/ssh-to

And here’s how to download a copy:

git clone https://github.com/Comcast/ssh-to.git
Continue reading “ssh-to: Easily manage dozens or hundreds of machines with SSH”

Two New Open Source Projects

At my day job, I get to write a bit of code. I’m fortunate that my employer is pretty cool about letting us open source what we write, so I’m happy to announce that two of my projects have been open sourced!

The first project is an app which I wrote in PHP, it can be used to compare an arbitrary number of .ini files on a logical basis. What this means is that if you have ini files with similar contents, but the stanzas and key/value pairs are all mixed up, this utility will read in all of the .ini files that you specify, put the stanzas and their keys and values into well defined data structures, perform comparisons, and let you know what the differences are. (if any) In production, we used this to compare configuration files for Splunk from several different installations that we wanted to consolidate. Given that we had dozens of files, some having hundreds of lines, this utility saved us hours of effort and eliminated the possibility of human error. It can be found at:

https://github.com/Comcast/compare-ini-files

Continue reading “Two New Open Source Projects”

Notes from February 2015 Philly DevOps Meetup: Security Practices for DevOps Teams

As a service to the Philly tech community (and because folks asked), I took notes at tonight’s presentation, called “Security Practices for DevOps Teams”. It was presented by Chris Merrick, VP of Engineering at RJMetrics.

Security is a “cursed role”

  • …in the sense that if you’re doing a really good job as a security engineer, no one knows you exist.
    • It isn’t sexy
    • It’s hard to quantify
    • It’s never done

As DevOps engineers, we are all de facto security engineers

Some tips to avoid ending up like this [Picture of a dismembered C3PO]

Sense. This picture makes none. 


  • Security Principles
    • Obscurity is not Security
      • “A secret endpoint on your website is not security”
      • “Don’t rely on randomness to secure things”
    • Least Privilege
      • Do not give more privileges than are needed
    • Weakest Link
      • If you talk to an insecure system, you’re at risk
    • Inevitability

Security Types

  • Physical
    • Stealing laptops
    • Breaking into datacenters
  • Application
    • Any vector that comes through an application you developed
      • XSS
  • Network*
  • Systems*
    • Applications you didn’t write
  • Human
    • Phishing, social engineering

Server Auth

  • Reminder:
    • Authentication is who you are
    • Authorization is what you can access
  • Don’t access production directory
    • Good news: this is our job anyways
  • Don’t spread private keys around
    • Don’t put in your Dropbox
    • Don’t let it leave the machine you generated it on
    • Use SSH agent forwarding
      • ssh-add
      • ssh -A you@remote
      • ssh-add -l
  • Don’t use shared accounts
    • Especially root
  • Be able to revoke access quickly
    • Time yourself. Go.
  • We use Amazon OpsWorks to help us achieve these goals
    • Chef+AWS, with some neat tricks: simple autoscaling, application deployment, and SSH user management

Logging

  • “Logs are your lifeline”
  • When you get into a high pressure security investigation, you start with your logs
  • Capture all authentication events, privilege, escalations, and state changes.
    • From your Os and all running applications
  • Make sure you can trust your logs
    • Remember – they’re your lifeline
  • Have a retention policy
    • We keep 30 days “hot”, 90 days “cold”
  • Logging – ELK
    • We use ELK for hot log searching
    • Kibana creates logs and lets you monitor your application in real time

Deployment

  • Keep unencrypted secrets out of code
    • Otherwise, a MongoLab exploit becomes your exploit
  • Don’t keep old code around
  • Make deployment and rollback easy
    • More good news: this is our job anyways
    • When dealing with a security issue, the last thing we need a “hard last step” in order to get the fix out
  • IAM
    • Don’t use your root account, ever.
      • Set a long password and lock it away
  • Set a strong password policy and require MFA
  • Don’t create API keys where API access isn’t needed
    • Same goes for a console password
  • Use Managed Policies
    • To make management easier
  • Use Roles to gran taccess to other systems
    • No need to deploy keys, auto-rotates
  • IAM Policy Pro Tips
    • Don’t use explicit DENY policies
      • Keep in mind that everything is denied by default
    • Don’t assume your custom policy is correct just because it saves – the interface only confirms the JSON is valid
    • Use the policy simulator
  • Know Thy Enemy
    • People are out there scanning for AWS keys – treat your private key like a private SSH key

Bonus Tips

  • Set up a security page
  • Sign up for the US-CERT email list, and the security notification list for your OS
  • Other resources
    • OWASP – owasp.org
    • SecLists.org
    • Common Weakness Enumeration – cwe.mitre.org

Conclusion

  • What else should we be doing to keep our work secure? [Picture of C3PO in a field full of flowers]

Git 101: How to Handle Merge Conflicts

In the last post, I talked about how to create a Git repository and upload it to GitHub. In this post, I’m going to talk about how to resolve Git conflicts.

Setting Up Our Environment

First, we’re going to create a Git Repository for the user Doug. Since I already covered that in the last post, I’m bring to breeze through those steps below:

$ mkdir doug
$ cd doug
$ git init
Initialized empty Git repository in /path/to/doug/.git/
$ touch README.md
$ git add README.md 
$ git commit -m "First commit"
[master (root-commit) d86a7e2] First commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 README.md

At this point, we have a repository created for the user Doug. Now I’m going to clone that repository for the user Andrew:

Continue reading “Git 101: How to Handle Merge Conflicts”

Git 101: Creating a Git Repo and Uploading It To GitHub

In this post, I’m going to discuss how to create a GitHub repo and upload (or “push”) it to GitHub, a popular service for hosting Git repositories.

What is revision control and why do I need it?

The concept of revision control is a system which tracks changes to files. In programming, that is usually program code, but documents and text files can also be tracked. Using revision control will give the following benefits:

  • You will know what was changed, when it was changed, and who changed it
  • Multiple people can collaborate on a project without fear of overwriting each others’ changes.
  • Protection against accidentally deleting a critical file. (revision history is usually read-only)

In GitHub, we store revisions in “repositories” or “repos” for short. As of this writing, the #1 service for storing Git repositories is GitHub. They offer free hosting for Git repositories.

Continue reading “Git 101: Creating a Git Repo and Uploading It To GitHub”

Notes from “Scaling on AWS for the First 10 Million Users”

Earlier tonight, I had the pleasure of attending a presentation from Chris Munns of Amazon at the offices of First Round Capital about scaling your software on AWS past the first 10 million users. I already had some experience with AWS, but I learned quite a few new things about how to leverage AWS, so I decided to write up my notes in a blog post for future reference, and as a service to other members of the Philadelphia tech community.

Without further preamble, here are my notes from the presentation:

Continue reading “Notes from “Scaling on AWS for the First 10 Million Users””

Multi Core CPU Performance in GoLang

I’ve been playing around with Google’s programming language known simply as Go lately, and have learned quite a bit about concurrency, parallelism, and writing code to effectively use multiple CPU cores in the process.

An Overview of Go

Pictured: Not A Serial Killer

If you are used to programming at the systems level, Go is effectively a replacement for C, but also has higher level functionality.

  • It is strongly typed
  • It can be compiled
  • It has its own unit testing framwork
  • It has its own benchmarking framework
  • Doesn’t support classes, but supports structures and attaching functions to variables instantiated from those structures
  • Doesn’t expose threads, but instead uses a construct called “channels” for communication between different parts of a Go program. More on channels shortly.
Continue reading “Multi Core CPU Performance in GoLang”