15:44:50 Meeting: RDB2RDF Working Group Teleconference
15:44:50 Date: 24 May 2011
16:06:22 Agenda: http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011May/0149.html
16:07:44 scribe: cygri
16:08:03 present+ Percy
16:08:29 Topic: Admin
16:08:34 scribenick: cygri
16:08:49 PROPOSAL: Accept the minutes of last meeting http://www.w3.org/2011/05/17-rdb2rdf-minutes.html
16:09:15 (no objections)
16:09:23 RESOLUTION: Accept the minutes of last meeting http://www.w3.org/2011/05/17-rdb2rdf-minutes.html
16:09:43 Topic: Dealing with RDB NULL values (ISSUE-41 and ISSUE-42)
16:09:49 ISSUE-41?
16:09:49 ISSUE-41 -- Define how rr:column, rr:template, etc. handle NULL column values -- open
16:09:49 http://www.w3.org/2001/sw/rdb2rdf/track/issues/41
16:09:52 ISSUE-42?
16:09:52 ISSUE-42 -- How is the direct mapping suppose to handle NULL values wrt Information Preserving -- open
16:09:52 http://www.w3.org/2001/sw/rdb2rdf/track/issues/42
16:10:11 Ashok: are we ready to decide on these or should we discuss some more?
16:10:38 EFranconi has joined #RDB2RDF I think at least on ISSUE-42 there was a concise proposal from david 16:10:50 ... ISSUE-41 might be more complicated 16:11:49 cygri: proposal from david was here: http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML 16:11:56 Marcelo has joined #rdb2rdf 16:12:17 1) by default R2RML will suppress triples when the subject, predicate, or object columns are NULL (this applies to any of the columns used in template expressions as well as direct column references) 16:12:24 2) if the application needs other handling for NULL values then a SQLQuery can be defined in the mapping to convert NULL values to some other application specific value 16:13:08 minor wording nit: i think you can s/by default// 16:13:26 +??P2 16:13:31 zakim, who is here? 16:13:31 On the phone I see dmcneil, boris, mhausenblas, privera (muted), Ivan, juansequeda, alexdeleon, EricP, Ashok_Malhotra, Souri, Marcelo, Seema, ??P2 16:13:33 mhausenblas has cygri, nunolopes 16:13:34 On IRC I see Marcelo, EFranconi, nunolopes, Souri, Seema, boris, mhausenblas, privera, dmcneil, Ashok, cygri, alexdeleon, Zakim, RRSAgent, ivan, iv_an_ru__, LeeF, trackbot, ericP 16:13:44 that is, r2rml will never encode a NULL value in an RDF graph 16:13:47 zakim, ??P2 is EFranconi 16:13:47 +EFranconi; got it 16:18:35 (discussion on R2RML vs direct mapping) 16:19:09 select name, NVL(salary,0) from employees; 16:19:48 EFranconi: does R2RML have schema informaiton? 16:20:00 Ashok: it's not well spelled out. i argue yes because it's in the queries 16:20:13 q+ 16:20:40 ivan: whoever writes R2RML for a specifc DB has to know about the schema of that DB 16:21:10 ack iv_an_ru__ 16:21:10 ... so in this case, the author of the mapping is in total control 16:21:13 ack ivan 16:21:20 Ashok: I agree 16:21:45 Souri: the mapping author knows about the schema, and specifies how to map it to RDF 16:21:47 So, is this a sort of a programming language? 16:22:09 got it 16:23:04 Ashok: so question comes up, does the translation spell out an RDF schema? i'd say yes 16:23:22 q? 16:23:23 q? 16:24:26 +1 to R2RML author is in full control (knows the DB schema and the target RDF schema) 16:25:00 EFranconi: so we can use R2RML to generate a constant for the nulls? 16:25:02 Ashok: yes 16:25:23 ... you can do anything you like 16:25:24 EFranconi: that's fine with me 16:25:27 + 16:25:27 1 16:26:06 ... I'm ok with that as resolution to ISSUE-41 16:26:13 +1 16:27:05 +q 16:27:50 IS NULL 16:28:14 dmcneil: is there something in the SQL standard to test for null? 16:28:17 e.g. WHERE foo.bar IS NULL 16:28:19 ericP: yes, IS NULL 16:28:22 what about => rr:graphTemplate "http://example.com/graph/{job}/{etype}" 16:28:25 +1 to the proposed resolution of Issue 41 16:28:54 PROPOSAL: resolve ISSUE-41 per dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML 16:29:38 -privera 16:30:00 Souri: if a graph template is null, where does the triple go to? 16:30:24 +??P4 16:30:40 Zakim, ??P4 is privera 16:30:40 +privera; got it 16:30:47 Zakim, mute me 16:30:47 privera should now be muted 16:30:55 cygri: let's treat it as a quad, so if graph goes null then we don't create a triple at all 16:31:08 dmcneil: that's my first reaction too 16:31:15 zakim, who is on phone? 16:31:15 I don't understand your question, Ashok. 16:31:49 If one or more of the columns used in a rr:graphTemplate, then corresponding triples will not be generated either. 16:32:28 If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either. 16:33:00 PROPOSAL: resolve ISSUE-41 per member:dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML 16:33:51 as extended by Souri : If one or more of the columns used in a rr:graphTemplate is NULL, then corresponding triples will not be generated either. 16:34:08 +1 16:34:09 +1 16:34:10 +1 16:34:12  +1 16:34:14 +1 16:34:14 +1 16:34:17 +1 16:34:19 +1 16:34:24 +1 16:34:41 RESOLUTION: resolve ISSUE-41 per dmcneil's proposal in http://www.w3.org/2001/sw/rdb2rdf/wiki/RDBNullValues#R2RML incl. Souri's extension 16:34:55 Topic: ISSUE-42, NULL in direct mapping 16:35:01 +1 16:35:10 Ashok: why is 42 different? 16:35:26 ericP: because no opportunity to inject a SQL query to modify the mapping 16:35:34 ... although you can create a view 16:35:59 ... but this null-handling view is not in some external configuration 16:36:56 Ashok: from email my impression is that most of the WG said, don't generate a triple 16:37:12 ... then there was back and forth ... i'm not sure where enrico stands now 16:37:30 EFranconi: the semantics for null is defined in sql 16:38:09 ... my goal is that this WG generates something that respects the semantics of queries 16:38:22 ... the WG proposed to drop nulls without showing how to preserve query semantics 16:38:41 ... i'm happy if the WG can show how to preserve semantics 16:39:00 ... i think it's easier to show this if the nulls are expressed as constants in the RDF 16:39:10 ... easier or more convincing 16:39:22 q+ to ask for use cases against which we can measure the preservation of semantics 16:39:35 Ashok: to make these special constants work, we'd have to change SPARQL, right? 16:39:38 ... and we can't do that 16:40:01 +q 16:40:03 ... that's why we're a bit stuck 16:40:10 ... so might be better not to generate triples 16:40:24 EFranconi: i think that's not the point 16:40:48 ... no matter how we map them, naive queries will not preserve the semantics of nulls 16:41:13 ... in all cases, we will have to prescribe a way how to write queries to keep the null semantics and get the right answers 16:41:24 ... the RDF graph will never be correct with respect to null 16:41:29 q? 16:41:46 ericP: we have two different query languages here 16:41:52 ... so we can't just copy and paste 16:41:57 ... have to look at user expectations 16:42:16 ... querying for nulls in SQL is in two cases especially 16:42:46 ack dmcneil 16:42:46 ... one, i query for missing information 16:42:47 -q 16:42:50 ack ericP 16:42:50 ericP, you wanted to ask for use cases against which we can measure the preservation of semantics 16:42:51 +q 16:43:02 -EFranconi 16:43:17 ... i'd like to have test queries that we can use to measure if we keep semantics 16:43:48 ... enrico, do you have specific use cases for queries that one can do in SQL but not in SPARQL 16:44:30 Marcelo, we talked a lot about null semantics in sql 16:44:36 ... but that's not well defined 16:44:36 q+ 16:44:37 Sorry my phone crashed 16:44:54 ... so it's hard to talk about null semantics if the semantics of null is not well defined 16:45:06 ... we have to agree on the null semantics 16:46:12 +??P2 16:46:15 I'm back 16:46:22 zakim, ??P2 is EFranconi 16:46:22 +EFranconi; got it 16:46:57 zakim, who is on the phone? 16:46:57 On the phone I see dmcneil, boris, mhausenblas, Ivan, juansequeda, alexdeleon, EricP, Ashok_Malhotra, Souri, Marcelo, Seema, privera (muted), EFranconi 16:47:01 mhausenblas has cygri, nunolopes 16:47:56 q+ 16:48:14 ack next 16:48:49 cygri: it's hard to show correctness if null semantics is not well defined. so we might have to lower our expectations w.r.t. showing correctness of the mapping 16:48:56 The semantics of NULLs in SQL is *well* defined 16:48:57 q- 16:49:22 EFranconi: semantics of null in SQL is well defined in the spec 16:49:35 ... what's not known is the model-theoretic semantics 16:49:48 ... we know the behaviour 16:50:41 +q 16:50:57 ... (scribe fails to keep up) 16:51:38 ... behaviour can be reproduced in SPARQL up to a certain expressivity 16:51:49 q+ 16:52:18 q+ 16:52:32 ... the null-ignoring mapping makes the sql-to-sparql translation of some simple queries quite hard 16:53:02 ack next 16:53:08 ... bottom line: by imitating what sql does, we can re-construct the behaviour up to a certain expressivity 16:53:35 ack next 16:53:41 Marcelo: i don't agree that the semantics is well-defined 16:54:01 ... proving the correctness requires well-defined semantics 16:54:07 q+ 16:54:17 ... hard e.g. with aggregates 16:54:42 ... translation will be very hard with sparql 1.1 16:55:19 cygri: Two different notions of what is a correct translation 16:55:54 ... query answering notion of correctness 16:56:23 ... another way of defining correntness based on the meaning of a graph 16:57:07 ... model theory and semanatics ... model theory not well defined wrt NULLs 16:57:43 cygri: Ignoring nodel-theoretic approach is not good 16:58:29 Souri: from practical point of view ... let's consider a db with some sparse tables 16:58:29 ... we have concrete test cases on using RDF triples in OWL and RDF 16:59:04 ack next 16:59:13 q? 16:59:34 ... if we can capture the schema, and the present data, but leave out the null triples, then we can still reconstruct the old database 16:59:42 +q 16:59:50 ... and if we can guarantee that, then we will be able to translate queries 17:00:30 ... so if the direct mapping always maps the schema too, and skips nulls, then we should be ok 17:00:52 Souri: DM should always generate the schema 17:01:06 ... then we know where the missing values are 17:01:18 EFranconi: i maintain that nulls are well-defined, by their behaviour 17:01:40 ... but i agree we want to avoid complexity 17:02:03 ... so that's why i would limit expressivity 17:02:49 My proposal: maintain data equivalence (allowing converting either way, without loss of info) => this can be done by DM 1) always generating schema triples and 2) skipping generation of triples for NULL values 17:02:51 ... Souri is right, if we have schema we still have the same information 17:03:09 ... the information is there, so we can define correct queries 17:03:30 ... but it will be more complex, and i still have to see a concrete proposal 17:03:59 ... and is the result compositional? 17:04:17 ?x rdf:type :EMP . ?p rdfs:range :EMP . OPTIONAL (?x ?p ?val) 17:04:31 correction:=> ?x rdf:type :EMP . ?p rdfs:domain :EMP . OPTIONAL (?x ?p ?val) 17:05:01 correction again (syntax) => ?x rdf:type :EMP . ?p rdfs:domain :EMP . 