The Semantic Web encompasses a series of concepts and standards whose goal is to give a common framework for the publication of data and knowledge accessible via the web or an intranet. The objective is provide data that are easily reusable, connected, and on which logical reasoning can be based.
The Semantic Web is founded on two principal rules:
- each resource described is identified by a distinct URI. From this URI, the resource is accessible to machines or humans according to the HTTP protocol,
- the resource’s description is created with a series of triples according to the RDF standard
A resource can be anything you want to say something about: a person, a place, an organization, an event …
The properties that connect two resources are always named, making it possible to understand their relations: “is the creator of,” “is a part of,” “participates in,” “is located in.” The word “semantic” in “Semantic Web” expresses the fact that all such relations are named, that the graph is thus immediately explicit for those who explore it, on the basis of a professional terminology.
In addition to URI and HTTP, the standards of the Semantic Web are:
- RDF, which normalizes the mode of describing resources in the form of triples,
- SPARQL, which lets you query a graph and update it,
- OWL, to model a knowledge graph
- SHACL, to describe the rules that apply to the graphs.
RDF - Resource Description Framework
Below are some short phrases, each of which provides information about Beyoncé or one of her songs
- Beyoncé was born on September 4, 1981.
- Beyoncé was born in Houston.
- Beyoncé is an R&B singer.
- “Dangerously in Love” was sung by Beyoncé.
- “Dangerously in Love” came out on June 17, 2003.
The RDF lets you formalize this type of statement in a group of triples.
An RDF triple is an association (subject, predicate, object):
- the “subject” represents the resource described;
- the “predicate” represents a type of property applicable to this resource;
- the “object” represents a datum or another resource: the property’s value.
The accumulation of these statements in the form of triples creates a knowledge graph.
SPARQL – Protocol and RDF Query Language
The goal of SPARQL is to be able to explore a graph by indicating the conditions that the resources searched for must fulfil. To find Belgian R&B singers between the ages of 30 and 40, you’ll include the following constraints in the query:
- the person composes works of music.
- the person is classified as an R&B musician.
- the person was born after 1980.
- the person was born before 1990.
- the person was born in a city.
- the city is located in Belgium.
You’ll find, among other results: Selah Sue.
OWL - Ontology Web Language
The OWL language lets you describe data models for the publication of knowledge in a graph, meaning:
- the categories of objects used in the graph: person, work of music, recording, performance …
- the properties used to describe the resources: “was born on,” “was born in,” “sings such and such a genre of music,” “plays such and such an instrument,” …
- the fact that one category is a sub-category of another: the category “work of music” is a sub-category of “work,”
- the fact that if one resource has a property of a given type, you can deduce that it belongs to a certain category. For example, if people play a musical instrument, you deduce that they belong to the “musician” category,
- the fact that the value of a property must belong to a category: for example, the value of the property “plays an instrument” must belong to the category “musical instrument.”
SHACL - Shapes Constraints Language
The SHACL language lets you validate RDF graphs with a group of conditions. For example, you can express the following business rules:
- a work played by at least one musician,
- a concert on a single date,
- the performance date of a work must be later than the date of its composition,
- the composition date of the work must be later than the composer’s date of birth
When a graph has been checked against the SHACL rules, you receive a report indicating the rules not respected by the graph. This is particularly useful for graphs supplied by multiple and heterogenous sources: databases, users, knowledge extraction from texts or images …