A few weeks ago I chanced upon a really interesting post at Data Twirling about analyzing NHL hockey data.  Looks like the NHL was nice enough to put all of this season’s game data in a nice JSON format.  Even better is that the data is recorded at the granularity of hits, shots, goals, and penalties (each with times and x-y coordinates).  Brock was nice enough to put his R code up on Github, so it was pretty straightforward to figure out how to pull down the data:

(ns nhl.data)

(def head "http://live.nhl.com/GameData/20102011/201002")
(def tail "/PlayByPlay.json")

(defn get-my-number[i]
(cond
(< i 10) (str "000" i)
(< i 100) (str "00" i)
(< i 1000) (str "0" i)
:else (str i) ))

(defn get-my-file [i]
(let [ inp (slurp (str head (get-my-number i) tail)) ]
(spit (str "data/file-" i ".json") inp) ))

(defn get-data [a b]
(doseq [i (range a (+ b 1))] (get-my-file i)))

Yep, I’m quite new to LISP, and I’m sure this could’ve been done a lot cleaner. I’ve also seen some folk put a pause into their slurper in order to be nice to the server, but I’m still a bit unclear on what the appropriate manners are… 5 second break between requests?  Got me.

Anyhoo, a few minutes later and I’m the proud owner of 55Mb of json that compresses down real nice to a 5Mb zip file.  The plan now is to explore how metrics can be defined that measure how much of an ‘Enforcer’ a player is.  I’ve been reading Brian Ellis’s Basic Concepts of Measurement lately, and this seems like a good chance to test out his ideas on what it means to measure something.  Should be fun!

Advertisements