data analytics

April 26th, 2019July 20th, 2019

Splunking Yelp Reviews

Awhile ago, I found myself trying to make a decision on which of several restaurants to eat at. They were all highly rated in Yelp, but surely there might be more insights I could pull from their reviews. So I decided to Splunk them!

TL;DR If you want to get straight to the code, go to https://github.com/dmuth/splunk-yelp-reviewsto get started.

Downloading the reviews

“Splunk: See your world. Maybe wish you hadn’t.”

Yelp has an API but, I am sorry to say that it is awful. It will only let you download 3 reviews for any venue. That’s it! What a crime.

So… I had to crawl Yelp venue pages to get reviews. I am not proud of this, but I was left with no other other option.

Python has been my go-to language lately, so I decided to solve the problem of review acquisition with Python. I used the Requests module to fetch the HTML code, and the Beautiful Soup module to extract reviews and page links from the HTML.

February 2nd, 2019March 16th, 2019

Hotel Opening: Anthrocon 2019 Report

One of my activities outside of the office consists of staffing furry conventions. One of those conventions is Anthrocon, a furry convention held in downtown Pittsburgh every June/July. At that particular convention, I manage the website and their social media properties.

Yesterday, we opened general hotel reservations, and that resulted in a huge rush of members booking hotel rooms. 1,000 rooms were booked in the first 15 minutes! This was completely expected, and we kept track of how things played out on social media, and also took a survey of members who booked hotel rooms to see how things went. In this post, we’re going to share what we learned based on those survey results and Twitter activity.

HOTEL BOOKINGS

First, did people who booked a hotel room get the hotel that they wanted?

For nearly 70% of you, the answer is yes. This makes us happy, but we would like to see the number higher—ideally 100% of our attendees would get a room in the hotel of their choice. This is something we continue to work on each year by adding new hotels and getting bigger room blocks in existing hotels.

January 4th, 2019July 20th, 2019

Introducing: Splunk Lab!

Splunk> Australian for grep.

In a previous post, I wrote about using Splunk to monitor network health and connectivity. While building that project, I thought it would be nice if I could build a more generic application which could be used to perform ad hoc data analysis on pre-existing data without having to go through a complicated process each time I wanted to do some analytics.

So I built Splunk Lab! It is a Dockerized version of Splunk which, when started, will automatically ingest entire directories of logs. Furthermore, if started with the proper configuration, any dashboards or field extractions which are created will persist after the container is terminated, which means they can be used again in the future.

A typical use case for me has been to run this on my webserver to go through my logs on a particularly busy day and see what hosts or pages are generating the most traffic. I’ve also used this when a spambot starts hitting my website for invalid URLs.

So let’s just jump right in with an example:

SPLUNK_START_ARGS=--accept-license \
   bash <(curl -s https://raw.githubusercontent.com/dmuth/splunk-lab/master/go.sh)

This will print a confirmation screen where you can back out to modify options. By default, logs are read from logs/, config files and dashboards are stored in app/, and data that Splunk ingests is written to data/.

Once the container is running, you will be able to access it at https://localhost:8000/ with the username “admin” and the password that you specified at startup.

First things first, let’s verify our data was loaded and do some field extractions!

Main Index
Field Extractions