A few weeks ago I chanced upon a really interesting post at Data Twirling about analyzing NHL hockey data. Looks like the NHL was nice enough to put all of this season’s game data in a nice JSON format. Even better is that the data is recorded at the granularity of hits, shots, goals, and penalties (each with times and x-y coordinates). Brock was nice enough to put his R code up on Github, so it was pretty straightforward to figure out how to pull down the data:
(ns nhl.data) (def head "http://live.nhl.com/GameData/20102011/201002") (def tail "/PlayByPlay.json") (defn get-my-number[i] (cond (< i 10) (str "000" i) (< i 100) (str "00" i) (< i 1000) (str "0" i) :else (str i) )) (defn get-my-file [i] (let [ inp (slurp (str head (get-my-number i) tail)) ] (spit (str "data/file-" i ".json") inp) )) (defn get-data [a b] (doseq [i (range a (+ b 1))] (get-my-file i)))
Yep, I’m quite new to LISP, and I’m sure this could’ve been done a lot cleaner. I’ve also seen some folk put a pause into their slurper in order to be nice to the server, but I’m still a bit unclear on what the appropriate manners are… 5 second break between requests? Got me.
Anyhoo, a few minutes later and I’m the proud owner of 55Mb of json that compresses down real nice to a 5Mb zip file. The plan now is to explore how metrics can be defined that measure how much of an ‘Enforcer’ a player is. I’ve been reading Brian Ellis’s Basic Concepts of Measurement lately, and this seems like a good chance to test out his ideas on what it means to measure something. Should be fun!