FAIR Data
FAIR Data is a set of principles applied to data to make it:
- Findable: based on common, human-readable language that is independent of standards, silos and constraints (semantics)
- Accessible: both within and without company boundaries, as an overlay on top of existing infrastructure without re-architecting existing systems
- Interoperable: a safe space for secure and trusted data sharing, creating a true data ecosystem where you control what can and cannot be shared
- Reusable: integrate data only once then easily share all or part of your data with trusted parties, removing the need for traditional point-to-point integrations and centralised data storage
You can read more about the origins of FAIR data at the Go Fair website.
IOTICS provides the infrastructure and the tooling that support the “FAIRification” process. In other words, users can use IOTICS to make their data FAIR.
Here we'll analyse how IOTICS relates to the FAIR data principles and how the FAIRification process can be implemented.
Being FAIR in IOTICS and why it matters
The core concepts in IOTICS are “data interactions” and “digital twins”.
A digital twin is a virtual representation of a “real” asset and it provides a single access point to both metadata of the asset and its data.
“Data interactions” occur when two twins exchange or share data with each other.
By having Digital twins making the underlying data asset FAIR, IOTICS enables interactions to be dynamic and autonomous.
In other words, Digital twins in IOTICS form a network of FAIR data points interacting dynamically and autonomously with each other.
Autonomous data interactions happen because IOTICS supports a “find and bind” pattern whereby a twin wanting some data can search the network, find relevant twins matching the search criteria, describe them to determine whether they’re useful or not, and then bind to them to receive the data.
The find and bind pattern can be programmed in each twin, as such making a twin an autonomous agent in the network.
Albeit FAIR data principles are generic and apply to any kind of (meta)data and for any data consumer (both humans and machines), IOTICS emphasis is on streaming data representing the now view of the asset.
FAIR deep dive
Let’s deep dive into how IOTICS allows users to make their data FAIR by analysing one by one the FAIR data principles in detail.
Findable
Findability refers to being able to find metadata and data unambiguously by humans and machines.
F1. (Meta)data are assigned a globally unique and persistent identifier
✅ IOTICS uses W3C DiD specification. Each twin has globally unique and persistent identifiers.
A DID is globally resolvable to a DID document providing the cryptographic identity of the twin, its keys and permissions. A globally resolvable identifier provides a simple mechanism to disambiguate
Persistence refers to the fact that the same DID refers to the same data point once assigned. This is important in that it allows disambiguation and reliability. In IOTICS, once a DID is assigned to a twin, it can’t be changed.
F2. Data are described with rich metadata (defined by R1 below)
✅ In IOTICS, “data” is generated by underlying “real” assets. It is available in feeds (topics) that another twin can subscribe to. A twin may have as many feeds as needed. Each feed can have as many values as desired.
The twin, the feeds and the values in a feed are all enriched with metadata. The twin decides how rich that metadata is.
It is very important to know that the metadata plane and the data plane are totally separated. This implies that a digital twin can exist and express its metadata even if the underlying real asset is not available or “attached” to the twin itself.
Metadata, therefore, can be searched in IOTICS.
F3. Metadata clearly and explicitly includes the identifier of the data they describe
✅ IOTICS adopts a “find&bind” pattern. Search requests are sent to IOTICS to match metadata and responses are sent back to a requestor as an array of “twin descriptions” including a subset of the available meta. Consumer twins can then choose to “describe” one or more twins for a full metadata report or directly bind to one or more feeds to get data.
Successful execution of a bind operation completes the interaction.
F4. (Meta)data are registered or indexed in a searchable resource
✅ IOTICS exposes a web API to the application. Each digital twin description is reachable via a URL on this web API. Publicly available twin metadata can therefore be listed and made available to internet search engines
Accessible
Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.
A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
✅ IOTICS uses HTTP/Websocket/STOMP or gRPC
A1.1 The protocol is open, free, and universally implementable
✅ IOTICS uses HTTP/Websocket/STOMP or gRPC
A1.2 The protocol allows for an authentication and authorisation procedure, where necessary
IOTICS delegation model, DiD and brokered interactions provide the necessary means for this
A2. Metadata is accessible, even when the data are no longer available
As discussed in F2, Data and Metadata sit on separate planes and metadata is available at the application's discretion independently from the existence of the underlying real asset
Interoperable
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
IOTICS uses semantic web technology, specifically RDF.
A twin is mapped to a web resource identified by its DID which is a URI.
Every piece of metadata is expressed as a “fact” and implemented as an RDF triple.
I2. (Meta)data use vocabularies that follow FAIR principles
IOTICS uses semantic web tech; As mentioned in I1, each metadata is expressed as a triple subject/predicate/object; the subject can be a link to a term part of an ontology.
IOTICS applications can refer to custom ontologies, publicly available ontologies or ontologies hosted by IOTICS
I3. (Meta)data include qualified references to other (meta)data
IOTICS allows the use of linked data ✅
Reusable
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. (Meta)data are richly described with a plurality of accurate and relevant attributes
IOTICS encourages the producer to use “product thinking” when creating twins. the use of RDF and semantic web tech. USEFULNESS of data is subjective but the creation of the data in context is achievable using linked data, directly supported by IOTICS by means of triple stores ✅
R1.1. (Meta)data are released with a clear and accessible data usage license
License can be linked to the digital twin using RDF (for example this) ✅
R1.2. (Meta)data are associated with detailed provenance
Data provenance can be encapsulated using RDF and cryptographically bound to the twin by using IOTICS DiD spec (example here) ✅
There’s an emergent standard to sign RDF metadata and IOTICS is working to implement it.
R1.3. (Meta)data meet domain-relevant community standards
IOTICS abides to semantic web tech and standards can be encoded in RDF (we have done it conceptually by implementing WaterML as RDF). It’s then possible to share the ontologies and formats as appropriate. ✅
The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).
Updated about 2 years ago