RubyRdfDatabaseSetup
See also: [RDFRdb]
How to get SQL backends set up for RubyRDF.
This is being used (for example) to experiment with RestaurantRecommendation in RDF/XML. Some examples below are from that work.
PostgreSQL setup
We have code for Postgre`SQL and My`SQL database access from Ruby.
OK, assuming PostgreSQL and a unix-alike environment.
We have a freshly installed (apt-get install postgresql) PostgreSQL on a Debian box.
create a user
danbri@snowball2:~$ sudo su - postgres postgres@snowball2:~$ createuser danbri Shall the new user be allowed to create databases? (y/n) y Shall the new user be allowed to create more new users? (y/n) y CREATE USER
Let's make the new database (name should be arbitrary, though some tests assume 'rdfweb1'):
danbri@snowball2:~$ createdb rdfweb1 CREATE DATABASE
setup tables
There are various ways we can run the initdb-pg script.
Here's we create and initialise a couple of scratch databases used for testing:
danbri@fireball:~/s-rubyrdf/pack/tests$ createdb rubyrdf1; GET http://www.w3.org/2001/12/rubyrdf/db/initdb-pg.sql | psql rubyrdf1 CREATE DATABASE ERROR: table "triples" does not exist ERROR: table "resources" does not exist CREATE CREATE CREATE CREATE CREATE CREATE CREATE CREATE danbri@fireball:~/s-rubyrdf/pack/tests$ createdb scutter1; GET http://www.w3.org/2001/12/rubyrdf/db/initdb-pg.sql | psql scutter1 CREATE DATABASE ERROR: table "triples" does not exist ERROR: table "resources" does not exist CREATE CREATE CREATE CREATE CREATE CREATE CREATE CREATE
(the error messages are normal; ignore...)
Or use Curl instead (eg. for Mac OS X where it comes as standard):
curl http://www.w3.org/2001/12/rubyrdf/db/initdb-pg.sql | psql rdfweb1
We should now be set up for RDF storage and query.
delete howto
Deleting :
danbri@fireball:$ psql rdfweb1 Welcome to psql, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit rdfweb1=> delete from resources; DELETE 25237 rdfweb1=> delete from triples; DELETE 31785 rdfweb1=>
'simple' triples table for debug
This query makes a complicated rdf store look nice and simple:
rdfweb1=> select r1.value as p, r2.value as s, r3.value as o, r1.keyhash as pc ode, r2.keyhash as scode, r3.keyhash as ocode from resources r1, resources r2,
resources r3, triples t where r1.keyhash=t.predicate and r2.keyhash=t.subject
and r3.keyhash=t.object;
We can make a view table using it:
rdfweb1=> create view simple AS select r1.value as p, r2.value as s, r3.value a s o, r1.keyhash as pcode, r2.keyhash as scode, r3.keyhash as ocode from resour ces r1, resources r2, resources r3, triples t where r1.keyhash=t.predicate and r2.keyhash=t.subject and r3.keyhash=t.object; CREATE rdfweb1=> \d simple View "simple" Column | Type | Modifiers --------+-------------------+----------- p | character varying | s | character varying | o | character varying | pcode | integer | scode | integer | ocode | integer | View definition: SELECT r1.value AS p, r2.value AS s, r3.value AS o, r1.keyhash AS pcode, r2.keyhash AS scode, r3.keyhash AS ocode FROM resources r1, resources r2, resources r3, triples t WHERE (((r1.keyhash = t.predicate) AND (r2.keyhash = t.subject)) AND (r3.keyhash = t.object));
Maybe this should go in the default schema?
useful queries
Here are some things I'm finding useful. Note that one can also do a lot of work through the pure RDF interfaces, but when things are going wonky or you want extra expressivity, it is handy to be able to talk to the database as SQL. Projecting it into a simple-minded single table of predicate/subject/object makes it much easier to deal with than trying to compose queries against its actual multi-table representation, too.
Find all the predicates used:
All the predicates:
rdfweb1=> select distinct p from simple; p -------------------------------------------------- http://chefmoz.org/rdf/elements/1.0/City http://chefmoz.org/rdf/elements/1.0/Country http://chefmoz.org/rdf/elements/1.0/Neighborhood http://chefmoz.org/rdf/elements/1.0/Zip http://purl.org/dc/elements/1.1/contributor http://purl.org/dc/elements/1.1/date http://purl.org/dc/elements/1.1/publisher http://purl.org/dc/elements/1.1/rights http://purl.org/rss/1.0/description http://purl.org/rss/1.0/items http://purl.org/rss/1.0/link http://purl.org/rss/1.0/modules/wiki/interwiki http://purl.org/rss/1.0/title http://www.w3.org/1999/02/22-rdf-syntax-ns#_1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type (15 rows)
All uses of some particular namespace:
select p,o from simple where p LIKE 'http://chefmoz.org/rdf/elements/1.0/%'; p | o --------------------------------------------------+---------------- http://chefmoz.org/rdf/elements/1.0/City | London http://chefmoz.org/rdf/elements/1.0/City | London http://chefmoz.org/rdf/elements/1.0/City | London http://chefmoz.org/rdf/elements/1.0/City | London http://chefmoz.org/rdf/elements/1.0/Country | United Kingdom http://chefmoz.org/rdf/elements/1.0/Country | United Kingdom http://chefmoz.org/rdf/elements/1.0/Country | United Kingdom http://chefmoz.org/rdf/elements/1.0/Country | United Kingdom http://chefmoz.org/rdf/elements/1.0/Neighborhood | City of London http://chefmoz.org/rdf/elements/1.0/Neighborhood | Clerkenwell http://chefmoz.org/rdf/elements/1.0/Neighborhood | Southwark http://chefmoz.org/rdf/elements/1.0/Zip | EC3V 0DR http://chefmoz.org/rdf/elements/1.0/Zip | W1K 2RP http://chefmoz.org/rdf/elements/1.0/Zip | WC1X 8JR http://chefmoz.org/rdf/elements/1.0/Zip | SE1 9EF (15 rows)
How many distinctly tagged graphs do we have? (note that libby/Inkling uses this field in ths SQL structure differently to DanBri/RubyRdf)
scutter1=> select count(distinct assertid) from triples; count ------- 694 (1 row) scutter1=> select distinct assertid from triples; }} Here we had a bunch of annoying rss:title properties we want to ignore... {{{ rdfweb1=> select p,o from simple where p LIKE 'http://purl.org/rss/1.0/title' a nd NOT s = 'http://the.earth.li/~kake/cgi-bin/cgi-wiki/TODO-link'; p | o -------------------------------+-------------------------- http://purl.org/rss/1.0/title | Anchor Bankside, SE1 9EF http://purl.org/rss/1.0/title | Crosse Keys, EC3V 0DR http://purl.org/rss/1.0/title | Audley, W1K 2RP http://purl.org/rss/1.0/title | Calthorpe Arms, WC1X 8JR (4 rows)
Offtopic: the ntriples we loaded...
(or tried to, what's up? need better debug tools!)
I grepped out the titles we got from the parser. Seems they didn't all get loaded.
<http://purl.org/rss/1.0/title> "Anchor Bankside, SE1 9EF" . <http://purl.org/rss/1.0/title> "Calthorpe Arms, WC1X 8JR" . <http://purl.org/rss/1.0/title> "Audley, W1K 2RP" . <http://purl.org/rss/1.0/title> "Crosse Keys, EC3V 0DR" . <http://purl.org/rss/1.0/title> "Cittie Of Yorke, WC1V 6BN" . <http://purl.org/rss/1.0/title> "Counting House, EC3V 3PD" .
I'm beginning to suspect the missing two records got garbled due to the wierd SHA1 hashes we're using to map from text strings (literals and uris) to numbers.
A bit more fruitless diagnostic fiddling:
rdfweb1=> select * from resources where value like 'Cittie%'; keyhash | value ----------+--------------------------- 53432275 | Cittie Of Yorke, WC1V 6BN (1 row) rdfweb1=> select * from simple where ocode = '53432275'; p | s | o | pcode | scode | ocode ---+---+---+-------+-------+------- (0 rows) rdfweb1=> select * from simple where pcode = '53432275'; p | s | o | pcode | scode | ocode ---+---+---+-------+-------+------- (0 rows) rdfweb1=> select * from simple where scode = '53432275'; p | s | o | pcode | scode | ocode ---+---+---+-------+-------+------- (0 rows) rdfweb1=> select * from triples where object='53432275'; subject | predicate | object | assertid | personid | isresource ------------+------------+----------+------------------------------------------- --------------+----------+------------ 1046910663 | 2095767658 | 53432275 | uri=http://example.com/grubst:Cittie_Of_Yo rke,_WC1V_6BN | | f (1 row) rdfweb1=> select * from resources where keyhash='1046910663'; keyhash | value ---------+------- (0 rows) rdfweb1=> select * from resources where keyhash='2095767658'; keyhash | value ------------+------------------------------- 2095767658 | http://purl.org/rss/1.0/title (1 row)
DEBUG: insert into resources values ( '-721365290', 'Counting House, EC3V 3PD' ) ; danbri@fireball:~/s-rubyrdf/pack/tests/net$ grep '\-721365290' xx DEBUG: insert into triples values ('-1529755978', '2095767658', '-721365290', ' uri=http://example.com/grubst:Counting_House,_EC3V_3PD','' ,'f');
Wishlist
- Libby had some useful stats queries.
consistency checker
- Find me any numbers from 'triples' table that don't have a lookup in 'resources' (consistency concerns...). And vice-versa.
A partial check:
SELECT triples.subject FROM triples LEFT OUTER JOIN resources ON triples.subject=resources.keyhash WHERE resources.keyhash IS NULL SELECT triples.predicate FROM triples LEFT OUTER JOIN resources ON triples.predicate=resources.keyhash WHERE resources.keyhash IS NULL SELECT triples.object FROM triples LEFT OUTER JOIN resources ON triples.predicate=resources.keyhash WHERE resources.keyhash IS NULL
The 1st of these found some suspects:
subject ------------- -1529755978 1046910663
Aha. I think...
Two different values. What's up?!
danbri@fireball:~/s-rubyrdf/pack/tests/net$ ./tc_sqlgen.rb |grep 'http://the.ea rth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke%2C_WC1V_6BN' RAW: Setting http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke %2C_WC1V_6BN -> -2001591609 RAW: Setting http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke %2C_WC1V_6BN -> 1046910663 RAW: Setting http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke %2C_WC1V_6BN -> -2001591609 KEY: http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke%2C_WC1V _6BN --> '-2001591609' ADDING: insert into resources values ( '-2001591609', 'http://the.earth.li/~kake /cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke%2C_WC1V_6BN' );
OK seem to have fixed it now.
(I'm looking foward to using the RDFQueryTestCases to get rid of some of this uncertainty...)
10 hits, 10 pubs. seems OK...