FAIR Data

FAIR Data is a set of principles applied to data to make it:

Findable: based on common, human-readable language that is independent of standards, silos and constraints (semantics)
Accessible: both within and without company boundaries, as an overlay on top of existing infrastructure without re-architecting existing systems
Interoperable: a safe space for secure and trusted data sharing, creating a true data ecosystem where you control what can and cannot be shared
Reusable: integrate data only once then easily share all or part of your data with trusted parties, removing the need for traditional point-to-point integrations and centralised data storage

You can read more about the origins of FAIR data at the Go Fair website.

IOTICS provides the infrastructure and the tooling that support the “FAIRification” process. In other words, users can use IOTICS to make their data FAIR.

Here we'll analyse how IOTICS relates to the FAIR data principles and how the FAIRification process can be implemented.

Being FAIR in IOTICS and why it matters

The core concepts in IOTICS are “data interactions” and “digital twins”.

A digital twin is a virtual representation of a “real” asset and it provides a single access point to both metadata of the asset and its data.

“Data interactions” occur when two twins exchange or share data with each other.

By having Digital twins making the underlying data asset FAIR, IOTICS enables interactions to be dynamic and autonomous.

In other words, Digital twins in IOTICS form a network of FAIR data points interacting dynamically and autonomously with each other.

Autonomous data interactions happen because IOTICS supports a “find and bind” pattern whereby a twin wanting some data can search the network, find relevant twins matching the search criteria, describe them to determine whether they’re useful or not, and then bind to them to receive the data.

The find and bind pattern can be programmed in each twin, as such making a twin an autonomous agent in the network.

Albeit FAIR data principles are generic and apply to any kind of (meta)data and for any data consumer (both humans and machines), IOTICS emphasis is on streaming data representing the now view of the asset.

FAIR deep dive

Let’s deep dive into how IOTICS allows users to make their data FAIR by analysing one by one the FAIR data principles in detail.

Findable

Findability refers to being able to find metadata and data unambiguously by humans and machines.

F1. (Meta)data are assigned a globally unique and persistent identifier

✅ IOTICS uses W3C DiD specification. Each twin has globally unique and persistent identifiers.

A DID is globally resolvable to a DID document providing the cryptographic identity of the twin, its keys and permissions. A globally resolvable identifier provides a simple mechanism to disambiguate

Persistence refers to the fact that the same DID refers to the same data point once assigned. This is important in that it allows disambiguation and reliability. In IOTICS, once a DID is assigned to a twin, it can’t be changed.

F2. Data are described with rich metadata (defined by R1 below)

✅ In IOTICS, “data” is generated by underlying “real” assets. It is available in feeds (topics) that another twin can subscribe to. A twin may have as many feeds as needed. Each feed can have as many values as desired.

The twin, the feeds and the values in a feed are all enriched with metadata. The twin decides how rich that metadata is.

It is very important to know that the metadata plane and the data plane are totally separated. This implies that a digital twin can exist and express its metadata even if the underlying real asset is not available or “attached” to the twin itself.

Metadata, therefore, can be searched in IOTICS.

F3. Metadata clearly and explicitly includes the identifier of the data they describe

✅ IOTICS adopts a “find&bind” pattern. Search requests are sent to IOTICS to match metadata and responses are sent back to a requestor as an array of “twin descriptions” including a subset of the available meta. Consumer twins can then choose to “describe” one or more twins for a full metadata report or directly bind to one or more feeds to get data.

Successful execution of a bind operation completes the interaction.

F4. (Meta)data are registered or indexed in a searchable resource

✅ IOTICS exposes a web API to the application. Each digital twin description is reachable via a URL on this web API. Publicly available twin metadata can therefore be listed and made available to internet search engines

Accessible

Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol

✅ IOTICS uses HTTP/Websocket/STOMP or gRPC

A1.1 The protocol is open, free, and universally implementable

✅ IOTICS uses HTTP/Websocket/STOMP or gRPC

A1.2 The protocol allows for an authentication and authorisation procedure, where necessary

IOTICS delegation model, DiD and brokered interactions provide the necessary means for this

A2. Metadata is accessible, even when the data are no longer available

As discussed in F2, Data and Metadata sit on separate planes and metadata is available at the application's discretion independently from the existence of the underlying real asset

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

IOTICS uses semantic web technology, specifically RDF.

A twin is mapped to a web resource identified by its DID which is a URI.

Every piece of metadata is expressed as a “fact” and implemented as an RDF triple.

I2. (Meta)data use vocabularies that follow FAIR principles

IOTICS uses semantic web tech; As mentioned in I1, each metadata is expressed as a triple subject/predicate/object; the subject can be a link to a term part of an ontology.

IOTICS applications can refer to custom ontologies, publicly available ontologies or ontologies hosted by IOTICS

I3. (Meta)data include qualified references to other (meta)data

IOTICS allows the use of linked data ✅

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

R1. (Meta)data are richly described with a plurality of accurate and relevant attributes

IOTICS encourages the producer to use “product thinking” when creating twins. the use of RDF and semantic web tech. USEFULNESS of data is subjective but the creation of the data in context is achievable using linked data, directly supported by IOTICS by means of triple stores ✅

R1.1. (Meta)data are released with a clear and accessible data usage license

License can be linked to the digital twin using RDF (for example this) ✅

R1.2. (Meta)data are associated with detailed provenance

Data provenance can be encapsulated using RDF and cryptographically bound to the twin by using IOTICS DiD spec (example here) ✅

There’s an emergent standard to sign RDF metadata and IOTICS is working to implement it.

R1.3. (Meta)data meet domain-relevant community standards

IOTICS abides to semantic web tech and standards can be encoded in RDF (we have done it conceptually by implementing WaterML as RDF). It’s then possible to share the ontologies and formats as appropriate. ✅

The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).