This post’ll be a short one tonight, since I just had to watch Adventureland (and thank god I did – that’s a great movie).

The analysis of enforcement doesn’t seem well-suited to a statistical approach, and I think it’s because there’s no pattern or distribution to be explained or predicted.  There aren’t any awards for enforcement, and top-ten lists of enforcers aren’t sufficiently … important? … to warrant prediction.

However, predicting which cities in the NHL will host the games with the most physical action is undeniably of public benefit.  As the League has cracked down on fighting and physical play over the years, it’s become increasingly important to find out where the good physical games will be played.

Although consideration for a scale of game physicality is appropriate, for now we’ll just tally the number of hits per game between each pairing of teams:

prefix nhl: <>
construct {
  _:z a nhl:HitsSummary . _:z nhl:participant ?x .
  _:z nhl:participant ?y . _:z nhl:value ?v }
  {  select ?x ?y (count(?h) as ?v)
     {  ?g nhl:hometeam ?x . ?g nhl:awayteam ?y .
        ?g nhl:play ?h . ?h a nhl:Hit . }
  group by ?x ?y	}

I’d like to take these results and tie them to the geo-coordinates of each team’s home city, so that I can generate a map depicting the relative physicality of each city that season.

To do this, I can use DBpedia’s endpoint to grab each team’s city’s lat/long.  On inspection, it seems that the DBpedia data provides the relevant geo-coordinates in decimal-minute-second form.  I’d much rather have it in decimal format (so that I can associate a single value for each of latitude and longitude to a team), so a little arithmetic is in order.  Fortunately, Sparql 1.1 allows arithmetic expressions in the select clauses.  Here’s the longitude query (with the latitude query being similar):

prefix rdfs: <>
prefix dbo: <>
prefix dbp: <>
prefix nhl: <>
construct { ?team nhl:longitude ?v . ?team nhl:name ?name }
  {  select ?team ?name
            ( -1 * (?d + (((?m * 60) + ?s) / 3600.0)) as ?v)
     {  ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
        ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
        ?city dbp:longd ?d; dbp:longm ?m; dbp:longs ?s .
        filter ( lang(?name) = 'en') }}

The DBpedia server is a little persnickety about memory, so it complains if I try to ask for lat and long in the same query.

You might be wondering about that -1 in the front of that arithmetic.  Longitude is positive or negative depending on whether it’s east or west of the Prime Meridian.  Since all the NHL hockey teams are west of the Meridian, the decimal longitude is negative.  If the DBpedia endpoint were compliant with the very latest Sparql 1.1 spec, I could have used an IF operator to interpret the East/West part of the coordinate.  However, it seems that feature isn’t implemented yet in DBpedia’s server, so this’ll have to do.

All that’s left is to query for each game, its home and away teams, and join those results with the relevant coordinates and hits-summary.  That query might take a few minutes for a whole season, but it’s a static result and thus safe to save off as an RDF file for quick retrieval later.

What does this have to do with enforcement?  Well, with next season’s schedule and the current rosters, we might be able to predict this physicality distribution based on the enforcement of the players on each team for each game.  Perhaps teams with higher enforcement correlate with more physical games.  Or, maybe a game tends to have higher physicality when one team has much more enforcement than the other.  And with historical data from past seasons, there should be some opportunity for verification.  And maybe even neat maps.