In this paper, I am going to argue about what functions the security of the Semantic Web will be mapped to, and investigate the usecases of those functions.

This paper is still under writing. I would like to make it better with discussing many other people.

1. What is the security of the Semantic Web?

1.1. The definition of the security

What is "security"? The dictionary (The American Heritage Dictionary of the English Language, Fourth Edition) defines the term "security" as follows:

Freedom from risk or danger; safety.
Freedom from doubt, anxiety, or fear; confidence.
...

The first definition is the typical image we have for the term "security" and is already familiar to us. In IT area, many technologies such as authorization, encryption and firewall are already in practical use to protect against malicious attacks from outside.

But what about the second definition? For this definition, I image like "a security person in airport". As the meaning of "security", not only simply protecting against malicious attacks from outside but also we can take "a technology which keeps whole system works well" as the meaning of "security".

In this paper, I am going to argue about those two kinds of security, and consider what kind of functions they will be realized with in the Semantic Web world.

1.2. The layer architecture of the Semantic Web

Before discussing about the security of the Semantic Web, I would like to clarify the architecture of the Semantic Web.

As referred in the Semantic Web architecture [Tim00] defined by Tim Berners-Lee, the architecture of the Semantic Web is defined as a layered architecture and the RDF layer is defined as the upper layer of the XML layer. Moreover, [XMLLandRDF] also defines that RDF model is built on top of XML.

I will describe my view about those definitions of the Semantic Web architecture in the other paper. I am going to discuss about the security of the Semantic Web in this paper with assuming that the RDF layer is placed above the XML layer as referred in above documents.

2. The first definition of the security in the Semantic Web

What is "the first definition of the security" described above in the Semantic Web? I am going to consider about the 2 functions, "encryption" and "authorization" as the typical functions of the first definition of the security.

2.1. Encryption in the Semantic Web - "Secrecy"

2.1.1. The relationship between encryption and secrecy

The encryption is one of the typical functions of the first definition of the security and its specification is defined in [XML Encryption]. But where will it be placed in the security of the Semantic Web?

Encryption is a function that encrypts information. But to put it more precisely, "to encrypt" itself is not the goal of the encryption. The goal of the encryption is "to keep some important information secret" with encrypting the information. Thus, encryption is only a step of keeping some information secret. I am going to call this large function "secrecy". We can consider "encryption" to be one of the functions that consist the large function "secrecy".

For example, if the information that "Nobu is studying in MIT" is a secret information, it is not same to "encrypt Nobu's MIT student ID". The fact that "Nobu is studying in MIT" is the subject to be kept secret and we derive the theory that "we should encrypt Nobu's MIT student ID" as the result of "deduction".

Then, what is the "secrecy" at all? With what functions is it consisted? I think "Secrecy" is, as the term says, a function that "keep the secrecy of a information". I think this "secrecy" is consisted with following functions besides XML encryption, the encryption function.

2.1.2. The concept of Secrecy level

In general encryption function such as XML Encryption, the states the cipher can take are 2 states, whether it "can be decrypted" or "not" with a specified key. But I think there will be more detailed states in the real cipher.

For example, "a information that is basically secret but it can be disclosed in emergency" or "an absolute secret information" or "a information that can be disclosed if the conditions are changed to some status".

Thus, I am going to introduce a concept "secrecy level". As the term says, it means:

"A measure how strong secrecy the information requires"

In practical systems, this "secrecy level" concept is realized with some application functions or changing cipher keys as the levels of secrecy. But originally, this "secrecy level" is information that the information itself owns. This is namely "metadata" and it should be realized in the RDF layer that is the layer "to describe metadata".

2.1.3. The tasks of the XML and RDF for the Secrecy

In this secrecy, how should we divide the tasks for XML layer and RDF layer? I think that tasks of the XML and RDF in secrecy are as follows:

XML layer	- - - Execute actual encryption/decryption.
RDF layer	- - - Describe only the policy of encryption/decryption in XML layer.

Thus, we guarantee to encrypt information actually in XML layer with XML Encryption, and in RDF layer, we attach the policy of the encryption in XML layer to the encryption target information with methods such as the Annotation.

2.1.4. Reasons and merits to describe the secrecy level in RDF layer

The reason why we describe the secrecy level in the RDF layer

As described above, these secrecy levels should be attached not to the serialized character information but to the "information" itself.

However, if we encrypt information in XML level, the information that "this information is encrypted" is XML level information. So after we decrypt those encrypted XML, the information that "this information was encrypted" can be lost.

This is like a case that a confidential document without the mark "confidential" on the document itself, and the mark is only printed on the envelope in which the document is. In this case, after we pick out the document from the envelope, the document is same as a normal document. This will cause problems. Confidential information is confidential whether it is in a confidential envelope or not. So in real world, the mark "confidential" is printed in the confidential document itself.

In the Semantic Web world, how can we think about those cases? The information that some RDF descriptions are confidential also should be described as RDF information in the RDF layer, too. The whole information including the information that it is confidential is the confidential information and removing the description that it is confidential from the confidential information should be an "alteration".

The merits to describe the secrecy level in the RDF layer

There are following merits to describe the secrecy level in RDF layer.

With describing the policy of encryption, "the semantics of the encryption", we can give the encryption function not only the function to conceal information but also "the meaning to encrypt".
We can separate the actual encryption method such as encryption algorism or encryption strength and the metadata that "the information should be encrypted". Therefore, we can make the secrecy function independent from the encryption method.

2.1.5. The inheritance of the secrecy level

Information deducted with secret information also should be secret. Thus, in the RDF deducting process, the secrecy level also should be deducted with "inheritance".

I think this secrecy level inheritance will be basically "AND" operation. Thus, the secrecy level of the deducted RDF will be the same level as the highest secrecy level among the RDF used for the deduction.

For example, the secrecy level of the RDF description deducted with "top-secret" and "common sense" should be basically "top-secret" so long as a "top-secret" is used to deduct it.

However, as I described "basically" above, there can be not only simple AND operation but also more complex deducting process. I am going to consider about those case in the following subsections.

2.1.6. Examples of the Secrecy level

I am going to take up some examples for the actual secrecy level.

Conditional Secrecy level

Let's assume a RDF description deducted with following information:

Information that can be disclosed from next week
Information that can be disclosed to the team members only

In this case, we cannot derive the secrecy level of the deducted RDF with simple AND operation. In this case, we can disclose the deducted information "if it will be next week and if the referrer is a team member".

Thus, we have to introduce some logic for deducting secrecy level. In some case, we might have to introduce some specific logic. Consequently, describing the Secrecy level in the RDF layer and processing deduction in Logic layer is very useful.

Countermeasure for inductive reasoning

We can deduct RDF description, this means we can also make inductive reasoning to RDF. That is to say, if several deducted results were given, we can guess the RDF description used for the deduction with inductive reasoning.

So we have to put some countermeasures for those inductive reasoning in the description of secrecy level. For example, we can define following secrecy levels:

The RDF description itself is secret in certain level, and a description deducted with the RDF can be disclosed if it is deducted over several steps.
The RDF description is secret, and a description deducted with the RDF is also same secrecy level with how many steps it is deducted.

We can also define secrecy level like this, "the result deducted in Logic layer is disclosable, but the process of the deduction is non-disclosable". In real world, this will be a case like "I can tell you the answer only. I can not tell you the reason."

2.2. Authentication in the Semantic Web - "Trust"

2.2.1. The relationship between authentication and trust

Next, I am going to think about the authentication in the Semantic Web.

Same as the case of encryption, in the case of authentication, "to authenticate" itself is not the final goal of the authentication. "Keeping the trust" with authenticating the information is the aim of the authentication, and the authentication function is one step of "keeping the trust". Thus, we can consider "authentication" to be one of the functions that consist the large function "trust".

For example, in the case that the information "Nobu is studying in MIT" is trustworthy information, "Signing Nobu's student ID" may not be enough to guarantee this information. The trustworthy information is the fact that "Nobu is studying in MIT", and we can deduct the logic that "we can sign Nobu's student ID" from it.

As described above, the relation between "authentication" and "trust" is very similar to the relation between "encryption" and "secrecy". In following sections, I am going to consider the relation between authentication and trust in the same way as encryption and secrecy.

2.2.2. The concept of Trust level

Same as the secrecy level in secrecy, we can introduce the concept "trust level" in the trust. the trust level is, as the term says,

"A measure how trustworthy the information is"

For example, this is like "this information is from a certain trustworthy source" or "this information is not trustworthy, but more trustworthy than normal information in the web".

Same as the secrecy level, in the practical systems, this trust level is realized with some application functions or changing signatures as the level of trust.

But the trust level is "metadata" information that the information itself owns and it should be realized in the RDF layer.

2.2.3. The tasks of XML and RDF for the Trust

For the trust level, there will be also following tasks of the XML and RDF.

XML layer	- - - Execute actual authentication such as signing and verification.
RDF layer	- - - Describe only the policy of authentication in the XML layer.

We guarantee the authentication function in the XML layer with XML Signature and we attach the authentication policy to the information to be authorized with methods such as the Annotation.

2.2.4. Reasons and merits to describe the Trust level in RDF layer

The reason why we describe the trust level in the RDF layer

I think the reason we describe the trust level in the RDF layer is basically same as the reason we describe the secrecy level in the RDF layer.

I think the reasons why we describe the trust level in the RDF layer are as follows:

Originally, the trust level information should be attached to the information itself but not to the serialized character information.
Even if we attach the signature to the serialized character information, the information that it should be authorized is lost in the RDF layer in which the semantics of the information are expressed. Thus, we have to describe the information that it should be authorized in the level of RDF that the information itself.

The merit to describe the trust level in the RDF layer

There will be following merits to describe the trust level in the RDF layer as same as that of the secrecy level.

With describing the policy of authentication, "the semantics of the authentication", we can give the authentication function not only the function to guarantee that the information is not altered but also "the meaning to authenticate".
We can separate the actual authentication method such as authentication algorism and the metadata that "the information should be authorized". Therefore, we can make trust function independent from the authentication method.

2.2.5. The inheritance of the Trust level

Information deducted with trustworthy information will also be trustworthy. Thus, in the RDF deduction process, the trust level also should be deducted with "inheritance".

I think this inheritance of the trust level will be basically "OR" operation. Thus, the trust level of the deducted RDF will be the same level as the lowest trust level among the RDF those are used for deduction.

For example, the trust level of a RDF description that is deducted with "certain information" and "uncertain information" should be basically "uncertain information" so long as "uncertain information" is used to deduct it.

However, as I described "basically", there can be more complex case of deducting process except for simple OR operation. I am going to describe about this in the following subsection.

2.2.6. Examples of the Trust level

Conditional trust level

(Not written yet.)

3. The second definition of the security in the Semantic Web

Next, let's consider about "the second definition of the security" described in section 1.1 for the Semantic Web?

3.1. Function to adapt contradiction - "Rationality"

3.1.1. RDF as a distributed database

We can point it out as one of the large merit of RDF that we can use Web contents widely spread all over the world as a distributed database.

The Web is already at a kind of huge a distributed database "position". Nevertheless, their "records" are basically character information with HTML or XML and they never have "function" suitable as a "database". There are some excellent search engines such as Google, but their technologies are basically based on keyword search like "grep". This is unavoidable unless the target records are simple character information and it is also obvious that the search ability has some limits.

Thus, to improve the search function drastically, we have to give further more "information" to the target. RDF is suitable to give such information "semantics" to the contents and RDF was born with such concept.

3.1.2. A function to detect contradiction

In the "RDF database" world described above, various people will write RDF contents as the record of the database. So we cannot avoid contradictions among them. In some case, someone might

(a) Write wrong contradicted RDF contents with malice.

And in some other case, someone might

(b) Write contradicted RDF contents as the result of the difference of the opinions of their author.

Anyway, as there are collided thinking or opinion in real world, in Semantic Web world, there will also exist contradicted RDF contents whether there is malice there or not.

This concept "contradiction" exists only the case the language can describe "meaning". Thus, I think this concept of security to adapt contradiction is particular concept of the Semantic Web that describes semantics.

When we do "deduction" or "reasoning", the applied RDF function, this will be a large problem. The function "deduction" assumes non-contradicted information and even if there are quite a few contradicted informations, the deduction does not work well. Thus, to adapt those contradicted RDF contents is a large problem as the RDF Security. And to begin with this adaptation, we need the function "to detect contradiction".

3.1.3. A function to make decision to the contradicted RDFs

After we realize the "contradiction detecting function" described above, the next problem is "how to treat those contradicted information".

As I wrote in previous section, there can be two patterns as the contradicted RDF description, "(a) contradiction by malice" "(b) contradiction by collision of opinion". In case of (a), one of them is obviously wrong and we can proceed deduction with ignoring it. But in case of (b), both of them are not wrong. The standard to make judgement is not "collect or wrong" but "agree or disagree", and the result will be different for every people who wrote the RDF contents and who make deduction with them.

Thus, for the trust of the contradicted RDF contents, there are following two types:

(A) One of them is absolutely collect; the trust of one information is absolutely higher.

(B) Their trusts are depend on each person's values; and different among users.

In case (A), we can give the trust level information from some reliable third party like rating. But in case (B), there is no absolute judgement. So the trust level information will be different for each user and they should be stored and provided for each user.

Anyway, we can not avoid that there will exists contradicted RDFs and we need the function that adapts those contradiction. I am going to call this function that keeps the rationality of the RDF and its deduction, "rationality".