I just added some built-ins for Jena rules in seabass that’ll let me figure out how many seconds/minutes/etc apart two times/dates/datetimes are, which lets me write a new definition for Enforcement that tracks that wikipedia definition a bit more closely:

[rule1: (?e1 nhl:influences ?e2) <-
        (?e1 nhl:localtime ?t1), (?e2 nhl:localtime ?t2),
        le(?t1, ?t2), diff-minute(?t1, ?t2, ?diff),
        lessThan(?diff, 5)  ]
nhl:Penalty	rdfs:subClassOf nhl:EnforcerAction .
nhl:Hit	rdfs:subClassOf nhl:EnforcerAction .
 construct { _:x a nhl:Enforcement . _:x nhl:game ?g .
                 _:x nhl:actor ?p1 . _:x nhl:value ?value }
where {
  select ?g ?p1 (count(?e) as ?value)
  where {
    ?g nhl:play ?e . ?g nhl:play ?e2 .
	?e nhl:agent1 ?p1 . ?e nhl:agent2 ?p2 .
	?e2 nhl:agent1 ?p2 . ?e2 a nhl:ViolentPenalty .
	?e a nhl:EnforcerAction . ?e2 nhl:influences ?e }
  group by ?g ?p1 }

Essentially, I stipulated that an earlier event influences a later event if they take place within five minutes. Best rule ever? Nope. Good enough? Maybe! So we have incidents of enforcement in a game when a Violent Penalty by a player influences an Enforcer Action against that player. Let’s see how this compares to the previous two definitions of enforcement:

The normalization is all off, largely because it’s getting late.  I should in fact normalize this score on all the opportunities for enforcement, which is really all the Violent Penalties against the enforcer’s team.

This dataset is drawn from the first 20 games of the 1230-game season, so this certainly isn’t a representative sample.  The bad news is that 20 games of data is just about as much as my laptop will crunch before I get impatient.  The good news is that I can re-jigger the analysis process to pull enforcement facts out of the season at 10-game increments…sort of like a map-reduce, I guess.  Should be fun!

Advertisements