My last post had some definitions to frame this analysis of enforcement (or, I suppose, the measurement of enfoo), so I may as well go ahead and formalize them.

Enforcement

 prefix nhl:
 construct { _:x a nhl:Enforcement . _:x nhl:game ?g .
 _:x nhl:actor ?p1 . _:x nhl:value ?value }
 where {
 select ?g ?p1 (count(?e) as ?value)
 where {
 ?g nhl:play ?e . ?g nhl:play ?e2 .
 ?e nhl:agent1 ?p1 . ?e nhl:agent2 ?p2 .
 ?e2 nhl:agent1 ?p2 . ?e2 a nhl:ViolentPenalty .
 ?e a nhl:EnforcerAction . ?e2 nhl:influences ?e }
 group by ?g ?p1 }

The enforcement facts tally up the number of retaliations performed by a player during a game.  The definition of an enforcement fact on a game-by-game basis (as opposed to period-by-period or minute-by-minute) is somewhat arbitrary.  However, there’s a lot of convenience here, since it’s easy to argue that hockey is most naturally partitioned by games.  If another temporal basis made sense, here is where the first definition adjustment would be made.

I should note that there’s been a slight departure from the previous post’s definition of Enforcer Action.  Previously, I only included a few kinds of penalties to count as enforcer actions (fighting, cross-checking, roughing, and unsportsmanlike conduct).  I went ahead and included hits, since it made sense as I was typing out the Construct queries – after all, enforcers that get away with a good retaliatory hit are doing their job better, right?

influences

[rule1: (?e1 nhl:influences ?e2) <-
    (?e1 nhl:localtime ?t1), (?e2 nhl:localtime ?t2),
     le(?t1, ?t2), diff-minute(?t1, ?t2, ?diff),
     lessThan(?diff, 2)  ]

I defined the relation influences to tie events together when one takes place two minutes before another.  I couldn’t think of much else in the dataset to represent when one event is a significant cause for another, and a two minute time frame should limit the number of retaliations against a player.  Of course, it’s still an arbitrary time limit, and it’d be interesting to explore other ways of identifying causal influence.

The rest of the details can be found here on github.  As an aside, one thing I’ve learned is to keep the repository names short and sweet; this repository is just one huge hyphenated mess.

If you take a gander at the analysis.clj file, you’ll see that I changed up the manner in which the models are built up, favoring a map-reduce strategy.  I played around with pmap to see if I could get a bump out of parallelization, but ran into threading problems with the reasoner.  I’m definitely not a parallelization kind of guy, so I stuck with what worked for me 🙂

There are two scales defined in the analysis file:

barce

prefix nhl: <http://www.nhl.com/>
construct { ?player nhl:barce ?value }
where {
  select ?player (sum(?enfActs) as ?value)
  where {
    ?x a nhl:Enforcement . ?x nhl:actor ?player .
    ?x nhl:value ?enfActs  }
  group by ?player }

barce-opp

prefix nhl: <http://www.nhl.com/>
construct { ?player nhl:barce-opp ?value }
where {
  select ?player (sum(?enfActs/ ?totChances) as ?value)
  where {
    ?x a nhl:Enforcement . ?x nhl:actor ?player .
    ?x nhl:value ?enfActs . ?x nhl:game ?game .
    ?y a nhl:ViolentPenaltyTally . ?y nhl:game ?game .
    ?y nhl:team ?otherTeam . ?y nhl:value ?totChances .
    ?z a nhl:GameRoster . ?z nhl:game ?game .
    ?z nhl:player ?player . ?z nhl:team ?team .
    filter( ?team != ?otherTeam) }
  group by ?player }

The barce scale is pretty straightforward – count up all the enforcement facts for a player over all the games in the model you’re analyzing.  The advantage is in its simplicity, both for query performance and for clarity – it’s fairly easy to explain the meaning of barce measurements to your friends.

The barce-opportunity scale is a bit more complicated.  In earlier posts, I referred to a normalization of the scale, and had this in mind.  Problem is, this isn’t normalization at all.  Rather, for any particular game and player, the barce-opportunity of that player in that game is the percentage of times that the other team did something dodgey and the enforcer retaliated.  So if the opposing team provoked the enforcer’s team four times, and the enforcer retaliated twice, his barce-opportunity for that game is one-half.

For an overall barce-opportunity measure across games, I just added them up.  The problem is that measurements on the scale can no longer be understood as percentages (since you can’t just add percentages willy-nilly).  I suppose I could average them to keep the values between zero and one, which would have the advantage of a clear interpretation for measurements across games:

prefix nhl: <http://www.nhl.com/>
construct { ?player nhl:barce-opp ?value }
where {
  select ?player (avg(?enfActs/ ?totChances) as ?value)
  where {
    ?x a nhl:Enforcement . ?x nhl:actor ?player .
    ?x nhl:value ?enfActs . ?x nhl:game ?game .
    ?y a nhl:ViolentPenaltyTally . ?y nhl:game ?game .
    ?y nhl:team ?otherTeam . ?y nhl:value ?totChances .
    ?z a nhl:GameRoster . ?z nhl:game ?game .
    ?z nhl:player ?player . ?z nhl:team ?team .
    filter( ?team != ?otherTeam) }
  group by ?player }

I’ll have to think on that a bit more.  For now, I’m fairly content with the definitions, and will next see what kinds of charts and patterns fall out of all this.

Advertisements