I was all ready to grab some geo-data and see where I could find a good physical game of hockey.  I’d browsed dbpedia and saw how to connect NHL teams, the cities they play in, and the lat-long coordinates of each.  I was so optimistic.

Then it turns out that DBpedia wanted me to do a special dance.  I’ve been quite happy with the features of the upcoming Sparql 1.1 spec.  Since ARQ stays on top of the spec, I’ve managed to forget what Sparql 1.0 was missing.  Well, ‘if’ clauses for one, but I managed to design around that in my last post.  A real sticker, though, was the inability to wrap a construct query around a select query, like so:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dbo: <http://dbpedia.org/ontology/>
prefix dbp: <http://dbpedia.org/property/>
prefix nhl: <http://www.nhl.com/>
construct { ?team nhl:latitude ?v . ?team nhl:name ?name }
{	select ?team ?name ( 1 * (?d + (((?m * 60) + ?s) / 3600.0)) as ?v)
	{ 	  ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
		  ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
		  ?city dbp:latd ?d; dbp:latm ?m; dbp:lats ?s .
		  filter ( lang(?name) = 'en') }}

The reason this is critical is that you can’t inject those arithmetic expressions into a construct clause.  And since I plan on working with the resulting data using Sparql, simply using select queries isn’t going to do it.

Thus, we need to break down the steps a bit more finely.  First, I’ll pull out the basic triples I intend to work with:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dbo: <http://dbpedia.org/ontology/>
prefix dbp: <http://dbpedia.org/property/>
prefix nhl: <http://www.nhl.com/>
construct { ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
                ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
                ?city dbp:latd ?d; dbp:latm ?m; dbp:lats ?s . }
   {   ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
        ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
        ?city dbp:latd ?d; dbp:latm ?m; dbp:lats ?s .
        filter ( lang(?name) = 'en') }

And crap – the data doesn’t fit very well.  Looks like the city names associated with hockey teams don’t cleanly match up to the cities we’re looking for in DBpedia.  Time for a second refactor…

After a few minutes of staring at the ceiling, I realized that I could use Google’s geocoding service to do my bidding.  Since their daily limit is 2500 requests, my measly 50ish cities would be well under.  So first, I grab just the info I need out of DBpedia – hockey teams and the cities they’re associated with:

 prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 prefix dbo: <http://dbpedia.org/ontology/>
 prefix dbp: <http://dbpedia.org/property/>
 prefix nhl: <http://www.nhl.com/>
 construct { ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
                 ?team dbp:city ?cityname . ?city rdfs:label ?cityname . }
      { ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
         ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
         filter ( lang(?name) = 'en') }

And use this query with a bit of Clojure to pull out my geocoding facts, saving them off as an n-triples files:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dbo: <http://dbpedia.org/ontology/>
prefix dbp: <http://dbpedia.org/property/>
prefix nhl: <http://www.nhl.com/>
select distinct ?team ?name ?city ?cityname
{ ?team a dbo:HockeyTeam . ?team rdfs:label ?name .
  ?team dbp:city ?cityname . ?city rdfs:label ?cityname .
  filter ( lang(?name) = 'en') }
(defn get-geo-fact [row]
(let [ n (string/replace (:cityname row) " "  "+")
  x (json/read-json (slurp (str
         "http://maps.googleapis.com/maps/api/geocode/json?address="
         n
         "&sensor=false")))
  g (:location (:geometry (first (:results x))))
  lat (str "<" (:city row) ">" 
           " <http://www.nhl.com/latitude> " 
           (:lat g) " ." )
  lon (str "<" (:city row) ">" 
           " <http://www.nhl.com/longitude> " 
           (:lng g) " ." )	]
[lat lon] ))

(defn make-geo-facts []
 (let [ a (bounce team-city dbp)
  f "files/geo-facts.nt" ]
 (spit f (string/join "\n" (flatten (map get-geo-fact (:rows a)))))	))

The results are created with the following two calls at the REPL:

(stash (pull team-city-constr dbp) "files/team-city.nt")
(make-geo-facts)

Now that I have geo-data, I can finish with hits-per-game as a rough cut at a physicality scale for games, and see where the action was this season.  I wonder if the Islanders and Rangers still go at it.