Copyright © 2003 W3C® ( MIT, INRIA, Keio), All Rights Reserved. liability, trademark, document use and software licensing rules apply.
In this paper, I am going to argue about what functions the security of the Semantic Web will be mapped to, and investigate the usecases of those functions.
This paper is still under writing. I would like to make it better with discussing many other people.
But what about the second definition? For this definition, I image like "a security person in airport". As the meaning of "security", not only simply protecting against malicious attacks from outside but also we can take "a technology which keeps whole system works well" as the meaning of "security".
In this paper, I am going to argue about those two kinds of security, and consider what kind of functions they will be realized with in the Semantic Web world.
As referred in the Semantic Web architecture [Tim00] defined by Tim Berners-Lee, the architecture of the Semantic Web is defined as a layered architecture and the RDF layer is defined as the upper layer of the XML layer. Moreover, [XMLLandRDF] also defines that RDF model is built on top of XML.
I will describe my view about those definitions of the Semantic Web architecture in the other paper. I am going to discuss about the security of the Semantic Web in this paper with assuming that the RDF layer is placed above the XML layer as referred in above documents.
Encryption is a function that encrypts information. But to put it more precisely, "to encrypt" itself is not the goal of the encryption. The goal of the encryption is "to keep some important information secret" with encrypting the information. Thus, encryption is only a step of keeping some information secret. I am going to call this large function "secrecy". We can consider "encryption" to be one of the functions that consist the large function "secrecy".
For example, if the information that "Nobu is studying in MIT" is a secret information, it is not same to "encrypt Nobu's MIT student ID". The fact that "Nobu is studying in MIT" is the subject to be kept secret and we derive the theory that "we should encrypt Nobu's MIT student ID" as the result of "deduction".
Then, what is the "secrecy" at all? With what functions is it consisted? I think "Secrecy" is, as the term says, a function that "keep the secrecy of a information". I think this "secrecy" is consisted with following functions besides XML encryption, the encryption function.
For example, "a information that is basically secret but it can be disclosed in emergency" or "an absolute secret information" or "a information that can be disclosed if the conditions are changed to some status".
Thus, I am going to introduce a concept "secrecy level". As the term says, it means:
In practical systems, this "secrecy level" concept is realized with some application functions or changing cipher keys as the levels of secrecy. But originally, this "secrecy level" is information that the information itself owns. This is namely "metadata" and it should be realized in the RDF layer that is the layer "to describe metadata".
XML layer | - - - Execute actual encryption/decryption. |
---|---|
RDF layer | - - - Describe only the policy of encryption/decryption in XML layer. |
However, if we encrypt information in XML level, the information that "this information is encrypted" is XML level information. So after we decrypt those encrypted XML, the information that "this information was encrypted" can be lost.
This is like a case that a confidential document without the mark "confidential" on the document itself, and the mark is only printed on the envelope in which the document is. In this case, after we pick out the document from the envelope, the document is same as a normal document. This will cause problems. Confidential information is confidential whether it is in a confidential envelope or not. So in real world, the mark "confidential" is printed in the confidential document itself.
In the Semantic Web world, how can we think about those cases? The information that some RDF descriptions are confidential also should be described as RDF information in the RDF layer, too. The whole information including the information that it is confidential is the confidential information and removing the description that it is confidential from the confidential information should be an "alteration".
I think this secrecy level inheritance will be basically "AND" operation. Thus, the secrecy level of the deducted RDF will be the same level as the highest secrecy level among the RDF used for the deduction.
For example, the secrecy level of the RDF description deducted with "top-secret" and "common sense" should be basically "top-secret" so long as a "top-secret" is used to deduct it.
However, as I described "basically" above, there can be not only simple AND operation but also more complex deducting process. I am going to consider about those case in the following subsections.
Thus, we have to introduce some logic for deducting secrecy level. In some case, we might have to introduce some specific logic. Consequently, describing the Secrecy level in the RDF layer and processing deduction in Logic layer is very useful.
So we have to put some countermeasures for those inductive reasoning in the description of secrecy level. For example, we can define following secrecy levels:
Same as the case of encryption, in the case of authentication, "to authenticate" itself is not the final goal of the authentication. "Keeping the trust" with authenticating the information is the aim of the authentication, and the authentication function is one step of "keeping the trust". Thus, we can consider "authentication" to be one of the functions that consist the large function "trust".
For example, in the case that the information "Nobu is studying in MIT" is trustworthy information, "Signing Nobu's student ID" may not be enough to guarantee this information. The trustworthy information is the fact that "Nobu is studying in MIT", and we can deduct the logic that "we can sign Nobu's student ID" from it.
As described above, the relation between "authentication" and "trust" is very similar to the relation between "encryption" and "secrecy". In following sections, I am going to consider the relation between authentication and trust in the same way as encryption and secrecy.
Same as the secrecy level, in the practical systems, this trust level is realized with some application functions or changing signatures as the level of trust.
But the trust level is "metadata" information that the information itself owns and it should be realized in the RDF layer.
XML layer | - - - Execute actual authentication such as signing and verification. |
---|---|
RDF layer | - - - Describe only the policy of authentication in the XML layer. |
I think the reasons why we describe the trust level in the RDF layer are as follows:
I think this inheritance of the trust level will be basically "OR" operation. Thus, the trust level of the deducted RDF will be the same level as the lowest trust level among the RDF those are used for deduction.
For example, the trust level of a RDF description that is deducted with "certain information" and "uncertain information" should be basically "uncertain information" so long as "uncertain information" is used to deduct it.
However, as I described "basically", there can be more complex case of deducting process except for simple OR operation. I am going to describe about this in the following subsection.
The Web is already at a kind of huge a distributed database "position". Nevertheless, their "records" are basically character information with HTML or XML and they never have "function" suitable as a "database". There are some excellent search engines such as Google, but their technologies are basically based on keyword search like "grep". This is unavoidable unless the target records are simple character information and it is also obvious that the search ability has some limits.
Thus, to improve the search function drastically, we have to give further more "information" to the target. RDF is suitable to give such information "semantics" to the contents and RDF was born with such concept.
This concept "contradiction" exists only the case the language can describe "meaning". Thus, I think this concept of security to adapt contradiction is particular concept of the Semantic Web that describes semantics.
When we do "deduction" or "reasoning", the applied RDF function, this will be a large problem. The function "deduction" assumes non-contradicted information and even if there are quite a few contradicted informations, the deduction does not work well. Thus, to adapt those contradicted RDF contents is a large problem as the RDF Security. And to begin with this adaptation, we need the function "to detect contradiction".
As I wrote in previous section, there can be two patterns as the contradicted RDF description, "(a) contradiction by malice" "(b) contradiction by collision of opinion". In case of (a), one of them is obviously wrong and we can proceed deduction with ignoring it. But in case of (b), both of them are not wrong. The standard to make judgement is not "collect or wrong" but "agree or disagree", and the result will be different for every people who wrote the RDF contents and who make deduction with them.
Thus, for the trust of the contradicted RDF contents, there are following two types:
Anyway, we can not avoid that there will exists contradicted RDFs and we need the function that adapts those contradiction. I am going to call this function that keeps the rationality of the RDF and its deduction, "rationality".
Figure2.1: The architecture of the security functions in the Semantic Web
[XMLandRDF] Semaview Inc., "XML and RDF Illustrated Ver3.0" (2003)
[XML Encryption] W3C Recommendation, "XML Encryption Syntax and Processing" (10 December 2002)
[XML Signature] W3C Recommendation, "XML Signature Syntax and Processing" (12 February 2003)
[Crypto00] T. Berners-Lee, "Crytpography example: delegated authentication" (October, 2000)
[Cool98] T. Berners-Lee, "Cool URIs don't change" (1998)