In my last post I suggested a graph pattern, or detection rule, to measure the enforcement that a hockey player is bringing to his team.  I’m going to modify the rule slightly, to track the wikipedia definition more closely:

prefix nhl: <http://www.nhl.com/>
construct { _:x a nhl:Enforcement . _:x nhl:game ?g .
            _:x nhl:actor ?p1 . _:x nhl:value ?value }
where {
  select ?g ?p1 (sum(?minutes) as ?value)
  where {
    ?g nhl:play ?e . ?g nhl:play ?e2 .
    ?e a nhl:EnforcerPenalty . ?e nhl:agent1 ?p1 . ?e nhl:agent2 ?p2 .
    ?e a ?class . ?class nhl:penaltyMinutes ?minutes .
    ?e2 nhl:agent1 ?p2 . ?e2 a nhl:ViolentPenalty .  }
  group by ?g ?p1}

While this captures the spirit of the wikipedia definition, it does move away from the apparent semantics, it does depart from the apparent semantics by substituting ‘enforcer penalties’ for hits.  That is, a player is acting like an enforcer when he gets penalized for certain activities in retaliation for violent penalties against his team.  So, an alternative detection rule might be a better

prefix nhl: <http://www.nhl.com/>
construct { _:x a nhl:Enforcement . _:x nhl:game ?g .
            _:x nhl:actor ?p1 . _:x nhl:value ?value }
where {
  select ?g ?p1 (count(?e) as ?value)
  where {
    ?g nhl:play ?e . ?g nhl:play ?e2 .
    ?e a nhl:Hit. ?e nhl:agent1 ?p1 . ?e nhl:agent2 ?p2 .
    ?e2 nhl:agent1 ?p2 . ?e2 a nhl:ViolentPenalty .  }
  group by ?g ?p1 }

Neither rule reflects an arguably crucial facet of the definition, namely the temporal ordering of events: the enforcer has to retaliate after a violent penalty (and most likely, within a certain time interval).  I’m going to leave the temporal ordering aside for now, just to compare the results of the detection rules thus far.

First, some ontology needs to be written to define what is meant by an enforcer penalty and a violent penalty.  I could consult a subject matter expert (ie Wikipedia).  However, I’ll settle for an arbitrary definition of these classes, as I’m still rather early in the analysis and can return to this later.  Here’s the ontology so far (in Turtle syntax, and omitting the prefix declarations:

nhl:EnforcerPenalty rdfs:subClassOf nhl:Penalty .
  nhl:CrossChecking    rdfs:subClassOf nhl:EnforcerPenalty .
  nhl:Fight            rdfs:subClassOf nhl:EnforcerPenalty .
  nhl:FightingMaj      rdfs:subClassOf nhl:EnforcerPenalty .
  nhl:Roughing         rdfs:subClassOf nhl:EnforcerPenalty .
  nhl:Unsportsmanlike  rdfs:subClassOf nhl:EnforcerPenalty .

nhl:ViolentPenalty rdfs:subClassOf nhl:Penalty .
  nhl:Charging         rdfs:subClassOf nhl:ViolentPenalty .
  nhl:HighSticking     rdfs:subClassOf nhl:ViolentPenalty .
  nhl:Roughing         rdfs:subClassOf nhl:ViolentPenalty .
  nhl:Slashing         rdfs:subClassOf nhl:ViolentPenalty .
  nhl:Unsportsmanlike  rdfs:subClassOf nhl:ViolentPenalty .

nhl:Charging                 nhl:penaltyMinutes	2 .
nhl:CrossChecking            nhl:penaltyMinutes	2 .
nhl:Fight                    nhl:penaltyMinutes 5 .
nhl:FightingMaj              nhl:penaltyMinutes	10 .
nhl:HighSticking             nhl:penaltyMinutes	2 .
nhl:Roughing                 nhl:penaltyMinutes	2 .
nhl:Slashing                 nhl:penaltyMinutes	2 .
nhl:Unsportsmanlike          nhl:penaltyMinutes	2 .

I’m leaving out a few penalties, and definitely want to re-consider the use of these definitions in the detection rules – there’s a bit too much arbitrariness to these definitions.  Nonetheless, let’s wrap these up and see what kind of results we get.  The following queries will be used to define the target dataset:

prefix nhl: <http://www.nhl.com/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?name (?enfMinutes / ?totMinutes as ?value)
where { ?x a nhl:Enforcement . ?x nhl:actor ?player .
        ?x nhl:value ?enfMinutes . ?x nhl:game ?game . 
        ?game nhl:totalPenaltyMinutes ?totMinutes .
        ?player rdfs:label ?name }
order by desc(?value)
prefix nhl: <http://www.nhl.com/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?name (?enfHits/ ?totHits as ?value)
where { ?x a nhl:Enforcement . ?x nhl:actor ?player . 
        ?x nhl:value ?enfHits . ?x nhl:game ?game . 
        ?game nhl:totalHits ?totHits . ?player rdfs:label ?name }
order by desc(?value)

This query attempts to normalize the enforcement in each game by dividing the enforcement score (# of hits or the sum of the enforcer penalties) by the total number of penalty minutes in the game.  In the interest of time, I’m only running this over the first 10 games of the season.  The REPL activity using seabass and this code is:

(view (bounce enforcement1 m1))
(view (bounce enforcement2 m2))

The RDF models are m1 and m2, corresponding to the two different definitions of enforcement.  The enforcement query is a Sparql Select statement ‘bounced’ against each model using the seabass bounce function (and using Incanter’s view function to pop up some fancy Swing windows).  I’ll hold off on an analysis until I refine those detection rules, but putting those results side-by-side shows that there is some overlap, but the top five results on the two lists are disjoint.  So it looks like there’s a significant semantic divergence between those two detection rules.  I wonder if adding in the temporal aspect to those rules would line them up much.

Advertisements