generated from netwerk-digitaal-erfgoed/requirements-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.bs
459 lines (390 loc) · 25.2 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
<pre class='metadata'>
Title: Overview of the ResearcherPod specifications
Shortname: overview
Level: 1
Status: LS
URL: https://mellonscholarlycommunication.github.io/spec-overview
Editor: Miel Vander Sande, [meemoo - Flemish Institute for Archives](https://meemoo.be), miel.vandersande@meemoo.be
Editor: Patrick Hochstenbach, [IDLab - Ghent University](https://knows.idlab.ugent.be), patrick.hochstenbach@ugent.be
Editor: Ruben Dedecker, [IDLab - Ghent University](https://knows.idlab.ugent.be), ruben.dedecker@ugent.be
Editor: Jeroen Werbrouck, [IDLab - Ghent University](https://knows.idlab.ugent.be), Jeroen.Werbrouck@ugent.be
Abstract: This document introduces a set of technical reports that facilitate the implementation of a decentralized data exchange ecosystem using Solid.
Markup Shorthands: markdown yes
</pre>
<style>
table {
margin: 25px auto;
border-collapse: collapse;
border: 1px solid #eee;
border-bottom: 2px solid #005A9C;
}
table tr:hover {
background: #f4f4f4;
}
table tr:hover td {
color: #555;
}
table th, table td {
color: #999;
border: 1px solid #eee;
padding: 12px 12px;
border-collapse: collapse;
}
table th {
background: #005A9C;
color: #fff;
}
table th.last {
border-right: none;
}
</style>
Set of documents {#set}
=======================
This document is one of the specifications produced by the **ResearcherPod** and **ErfgoedPod** projects:
1. [Overview](/spec-overview/) (this document)
2. [Orchestrator](/spec-orchestrator/)
3. [Data Pod](/spec-datapod/)
4. [Rule language](/spec-rulelanguage/)
5. [Artifact Lifecycle Event Log](/spec-eventlog/)
6. [Notifications](/spec-notifications/)
7. [Collector](/spec-collector/)
Introduction {#intro}
=====================
This document introduces a set of technical reports that facilitate the implementation of a decentralized data exchange ecosystem using [[solid-protocol]].
The intent is not to establish a network design for shipping data.
Rather, it should enable (human or machine) agents to make artifacts available in the network,
add value to the available artifacts, and exchange messages about these activities.
Essentially,
these reports focus on two generic problems common to decentralized Web networks:
- executing and automating business processes involving a distributed set of actors.
- discovering and collecting information that is distributed over the network.
This discovery of information that is stored on a decentralized Web can be quite generic. In this project
there is a particular focus on the discovery of lifecycle events to which the artifacts are subject.
Examples of such a lifecycle events in the realm of scholarly publication is information about when and where
articles have been cited, or peer-reviewed, or published in a journal, archived in a web archive.
This work aims to complement the [[solid-tr]] with a concrete framework for building a semi-automated decentralized network for a specific use case or domain.
It is joint output from two aligned projects:
- ResearcherPod: a [Ghent University - IDLab](http://idlab.ugent.be) project funded by the [Andrew W. Mellon foundation](https://mellon.org/) that explores an alternative scholarly communication system that is researcher-centric, institution-enabled, and aligned with Decentralized Web concepts and technologies.
- ErfgoedPod: a collaboration between [Netwerk Digitaal Erfgoed](https://netwerkdigitaalerfgoed.nl), [meemoo - Flemish Institute for Archiving](https://meemoo.be) and [Ghent University - IDLab](http://idlab.ugent.be) to investigate the feasability of applying [Solid](https://solidproject.org/) and Decentralized Web concepts to establish a sustainable exchange network for digital heritage data.
Work items {#work-items}
=====================
The following table provides an overview of all technical reports and subject matter that is being worked on.
The reports incorporated have been discussed among the project members and are constructed as project deliverables.
During the course of these projects, the information in these documents may be subject to change, therefore please see each document’s publication status and versions for further details.
Of course, you are invited to [contribute](https://github.com/MellonScholarlyCommunication/spec-overview/issues) any feedback, comments, or questions you might have.
<table>
<caption>Technical Reports</caption>
<thead>
<tr>
<th>Work Item</th>
<th>Repository</th>
<th>Current Stage</th>
<th>Next Stage</th>
<th>Target</th>
<th>Expected Completion</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="/spec-overview/" rel="cito:citesForInformation">This document.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-overview">spec-overview</a></td>
<td>Technical Report</td>
<td>Technical Report</td>
<td>Technical Report</td>
<td>Q1 2022</td>
</tr>
<tr>
<td><a href="/spec-orchestrator/" rel="cito:citesForInformation">Specification of the Orchestrator component.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-orchestrator">spec-orchestrator</a></td>
<td>Working Draft</td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Q4 2022</td>
</tr>
<tr>
<td><a href="/spec-datapod/" rel="cito:citesForInformation">Implementation guidelines and additional requirements for Solid data pods.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-datapod">spec-datapod</a></td>
<td>Working Draft</td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Q4 2022</td>
</tr>
<tr>
<td><a href="/spec-rulelanguage/" rel="cito:citesForInformation">Specification of the rule language to create executable business processes.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-rulelanguage">spec-rulelanguage</a></td>
<td>Working Draft</td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Q4 2022</td>
</tr>
<tr>
<td><a href="/spec-eventlog/" rel="cito:citesForInformation">Implementation requirements for the Artifact Lifecycle Event Log.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-datapod">spec-datapod</a></td>
<td>Working Draft</td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Q2 2022</td>
</tr>
<tr>
<td><a href="/spec-notifications/" rel="cito:citesForInformation">Specification of the possible notifications that can be used in the network.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-notifications">spec-notifications</a></td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Technical Report</td>
<td>Q1 2022</td>
</tr>
<tr>
<td><a href="/spec-collector/" rel="cito:citesForInformation">Specification of the Collector component.</a></td>
<td><a href="https://github.com/mellonscholarlycommunication/spec-collector">spec-collector</a></td>
<td>Working Draft</td>
<td>Draft Technical Report</td>
<td>Technical Report</td>
<td>Q4 2022</td>
</tr>
</tbody>
<tfoot>
<tr>
<td colspan="6"></td>
</tr>
</tfoot>
</table>
Terminology {#definitions}
===========
<figure>
<img src="./images/terminology.svg" />
<figcaption>
Terminology used in the specifications.
</figcaption>
</figure>
The following terms are used across all specifications:
: <dfn export>Agent</dfn>
:: An agent is a person, an organization, or software identified by a [[solid-protocol#uniform-resource-identifier|URI]]; e.g., a WebID denotes an agent [[webid]] (cited from [[solid-protocol]]).
:: Agents actively participate in the network: they perform actions, are identified by a [[webid]], and addressed through [=notification=].
: <dfn export>Artifact</dfn>
:: An artifact is a [[solid-protocol#resource|Web resource]] identified by an [[solid-protocol#uniform-resource-identifier|URI]] that serves as the main focus of interaction between [=agents=]. Examples are a digitized image in an archive, the webpage of a scientific publication, a digital representation of a cultural heritage object, or a data set in a repository.
:: Artifacts can be atomic or arbitrarily complex. How they are organized is outside the scope of this document and is dependent on the implementing community.
: <dfn export>Notification</dfn>
:: A notification is a message delivered from one [=agent=] to another using the [[ldn|Linked Data Notifications (LDN)]] protocol.
:: Notifications are expressed in [[activitystreams-vocabulary|Activity Streams 2.0 (AS2.0)]] and their payload describes an [=activity=].
: <dfn export>Activity</dfn>
:: An activity describes some form of past or present action that is directly or indirectly related to an [=artifact=].
:: Activities are conform with the [[activitystreams-vocabulary#activities|AS2.0 Activities]].
: <dfn export>Value-added service</dfn>
:: A service that somehow increases the social valuation of an [=artifact=] without modifiying it.
:: Examples are getting promoted, getting certification, gettin a good review ...
: <dfn export>Service Result</dfn>
:: A service result is any outcome of providing a value-added service for an [=artifact=].
:: Service results can be [[solid-protocol#resource|Web resources]] that are made available at a network address or mentioned inline in an [=activity=] addressed to the [=agent=] owning the [=data node=] where the [=artifact=] is hosted.
: <dfn export>Node</dfn>
:: A node is the main component in the network. It provides artifacts or artifact-related services to the network, and is able to communicate with other nodes.
:: A node is a [[solid-protocol#http-server|HTTP Server]] that hosts two types of resources: an [=inbox=] and a [=Lifecycle Event Log=]. It is also an Linked Data Notifications [[ldn#receiver|Sender]], [[ldn#receiver|Receiver]] and [[ldn#receiver|Consumer]].
: <dfn export>Data Node</dfn>
:: A data node is a [=node=] that hosts [=artifacts=].
: <dfn export>Service Node</dfn>
:: A service node is a [=node=] that provides value-added services for [=artifacts=] hosted by [=data nodes=], but does not host [=artifacts=] itself. It can host [=service results=].
:: A service node minimally meets the [=node=] requirements, but can also be a (partial) implementation of the [[solid-protocol]].
:: Note that when a service node creates a new [=artifact=] as a result of providing a value-added service for an artifact, then this new [=artifact=] can be made available in the network in its own right and can itself become the subject of [=activities=]. At that point, however, the service node must have an associated [=data node=] via which the new [=artifact=] is made available.
: <dfn export>Inbox</dfn>
:: An inbox is a [[solid-protocol#resource|web resource]] conform with the [[!ldn|Linked Data Notifications specification]]. In case the [=node=] that hosts the inbox is a [=data pod=], this is usually implemented as an [[ldp#ldpc-container|LDP Container]].
:: [=Agents=] can POST an [=activity=] in order to send information pertaining to an [=artifact=] to another [=agent=], in order to inform the other party or request a service.
: <dfn export>Lifecycle Event</dfn>
:: A life cycle event is an [=activity=] pertaining to an [=artifact=] that impacts the artifact's lifecycle and is made public. It is communicated via a [=notification=] and that the [=node=] sending or receiving the [=notification=] deemed relevant to disclose.
: <dfn export>Lifecycle Event Log</dfn>
:: A lifecycle event log is an append-only public [[solid-protocol#resource|web resource]] that exposes a series of [=lifecycle events=] related to [=artifacts=] stored by [=data nodes=] and services provided by [=service nodes=].
:: It constructs a view over an [=inbox=]'s contents that is determined by a [=selector=], allowing a selected subset of [=activities=] (eg. grouped per artifact) to be consumed. A [=node=] is responsible for making the lifecycle event log publicly discoverable.
: <dfn export>Selector</dfn>
:: A selector is a boolean function that decides whether or not a [=activity=] belongs to the [=lifecycle event log=] or not.
: <dfn export>Collector</dfn>
:: An automated Web application that collects information from [=lifecycle event logs=] hosted by any [=node=].
The following additional terms are specific to [=data nodes=] that, in addition to meeting the minimal requirements, also implement the [[solid-protocol]]:
<figure>
<img src="./images/terminology-pod.svg" />
<figcaption>
Terminology specific to [=data nodes=] implemented with [[solid-protocol|Solid]].
</figcaption>
</figure>
: <dfn export>Data Pod</dfn>
:: A data pod is a place for storing documents, with mechanisms for controlling who can access what (cited from [[solid-protocol]]).
:: A data pod can be used to construct a [=data node=] that is conform to the [[solid-protocol]]. It stores [=artifacts=] that are made available to the network and is a Linked Data Notifications [[ldn#receiver|Receiver]].
: <dfn export>Data Pod owner</dfn>
:: An owner is a person or a social entity that is considered to have the rights and responsibilities of a data storage. An owner is identified by a URI, and implicitly has control over all data in a storage. An owner is first set at storage provisioning time and can be changed (cited from [[solid-protocol]]).
:: A data pod owner is an [=agent=] that is responsible for maintaining the [=data pod=] and its [=artifacts=]. The owner can be identified by its [[webid]] and reached via an [=inbox=].
:: For example: in case the data pod stores scholarly artifacts, the owner is typically an author or contributor to these artifacts. In case the data pod stores digital heritage objects, the owner is typically the institution or the institutional employee that curates and maintains these collections.
: <dfn export>Service Node owner</dfn>
:: A service node owner is the [=agent=] responsible for maintaining the [=service node=] and the service it provides.
:: The owner can be identified by its [[webid]] and reached via an [=inbox=].
: <dfn export>Dashboard</dfn>
:: The dashboard is a user-facing Web application that enables an [=agent=] to manually interact with other [=agents=] or [=nodes=] in the network.
:: The dashboard is a Linked Data Notifications [[ldn#receiver|Sender]] and [[ldn#receiver|Consumer]]. In case it is used to manage a [=data pod=], it will also be a compliant [[solid-protocol#solid-app|Solid App]] implementation.
: <dfn export>Orchestrator</dfn>
:: The orchestrator is an autonomous Web application that operates on behalf of an agent [=agent=] and has access to a [=data pod=]. It interprets and executes business rules described in a [=rulebook=].
:: The orchestrator is also a Linked Data Notifications [[ldn#receiver|Sender]] and [[ldn#receiver|Consumer]].
: <dfn export>Rulebook</dfn>
:: A set of machine-readable business rules that instruct the [=orchestrator=] on what actions to take in response to incoming [=notifications=].
Agents, Artefacts and Service Results {#agents-arte}
=======================================================
Artifacts are the main focus of this decentralized network design.
They are pieces of data (eg. a file, a document, ...) produced by the Agents that participate in the network and hosted on [=data nodes=]
The goal is, however, not to enable the exchange the artifacts, but rather to facilitate discourse about them.
Artifacts hold potential value in the domain in which the Agents operate.
Over the network, Agents can send messages to other Agents to request value-added services.
Each executed service increases the value-chain of an Artifact, which is published in an event log.
The contents of an artifact are considered out-of-scope, and therefore, they are viewed as black-boxes to the network's design.
Services are performed by [=service nodes=] and can produce a [=Service Result=] after execution.
Service results are hosted on [=service nodes=] and provide metadata about the service run.
The difference with artifacts is that service results lack a relevant value-chain.
If interesting, a service result can be promoted to artifact,
eg. when a review of a document itself becomes subject to review.
In that case, however, the service node takes up the role of [=data node=] as well.
The following example illustrates a basic interaction for which this network design can be applied.
1. An Agent called Alice creates an artifact: a document. The document is stored in Alice's Pod.
2. Alice offers the artifact to a Reviewing Service (a Service Node and Agent) in order to review it.
3. The Reviewing Service performs the service and produces a Service Result.
4. The Reviewing Service informs Alice that the Service was performed.
5. The value-chain of the document is enhanced with a reviewed event.
<figure>
<img src="./images/flow.svg" />
<figcaption>
example workflow.
</figcaption>
</figure>
Dashboard and Orchestrator {#differences}
=======================================================
In a Solid implementation of these specifications,
two software applications interact directly with the [=Data Pod=] with different privileges: the [=Orchestrator=] and the [=Dashboard=].
While these applications could overlap or might not be required at all (in
some use cases), in our specifications they are treated as separate entities to help the discussion.
Some reasons why these applications could be treated as separate entities:
- privileges needed to access the [=data pod=]
- requirements needed to receive direct feedback from an [=agent=]
- requirements needed to permanently be **online**
- requirements needed to correctly implement [=notifications=] and [=rulebooks=]
Typically, the [=Dashboard=] runs in a browser and responds to feedback from an [=agent=].
It can be in an online or offline modus, running on the computer of the [=agent=] in one of the browser tabs.
The [=Orchestrator=], however, runs as a *always-on* background-process or remote Web service.
Both can read the [=Inbox=] of the [=Data Pod=] and append to the [=Lifecycle Event Log=]. Both communicate with the network using the [=Notifications=].
Only the [=Orchestrator=] is guided by a [=Rulebook=] and operates with little human intervention.
The network below demonstrates the CRUD privileges imagined for the different [=agents=] in this document.
The first network demonstrates a typical Solid setup where the [=dashboard=] is a single page application that has direct access
to the [=data pod=]. The [=Orchestrator=] could be a process that is part of the [=dashboard=],
but is runs here as a separate network service.
It is always online and works on behalf of the [=data pod owner=].
Finally, there can be other applications in the
network (indicated by Something) that can read data from the [=data pod=] or send [=notifications=] to
the [=data pod=], but without requiring direct privileges to change artifacts on the [=data pod=].
<figure>
<img src="images/mellon-crud-app.svg" width="80%">
<figcaption>
CRUD operations in case the [=Dashboard=] is single-page application and [=Orchestrator=] a background task
</figcaption>
</figure>
The second network demonstrates a more classic setup with a browser [=Dashboard=] controlled by a server application which uses a
[=data pod=] as backend storage.
<figure>
<img src="images/mellon-crud.svg" width="80%"></div>
<figcaption>
CRUD operations in case the [=dashboard=] and a [=service node=] (which also runs to an
[=Orchestrator=] component) is a classic client/server application
</figcaption>
</figure>
Communication between Data Pods and domain-specific networks {#domain-specific-networks}
============================================================
## Communication between Data Pod and Scholarly Community network ## {#scholarly-comm}
[=Notifications=] can be sent from the researcher environment to [=Service Node=] environments.
For instance, in case of a request to review an artifact that resides in the [=Data Pod=], an
appropriate [=notification=]] can be sent to a review [=Service Node=]. The [=Service Node=] can respond,
for example, accepting or rejecting the review request, and, in the latter case, to relay
the result of the review.
The [=Orchestrator=] sends notifications in response to triggers that result from the execution of
the rulebook is associated with the [=Data Pod=]. The [=Orchestrator=]
receives notifications in response to the ones it sent. The [=Orchestrator=] selectively records information
contained in both outgoing and incoming notifications in the [=Lifecycle Event Log=].
The [=notifications=] are regarded as a high-level approach to automatically coordinate the distributed
execution of the crucial functions of scholarly communication. The notifications merely ensure that the
respective functions are in effect executed as prescribed by the rulebook, but do not attempt to
automate the actual fulfillment of the function itself. For instance, when an `Offer` is sent to a Review Service
we don't envision that message contains all the steps to fully automate the submission process.
It could contain enough metadata to enable simple workflows. In general out of band communication
could be needed to perform all required steps.
## Communication between Data Pod and Cultural Heritage network ## {#cultural-heritage}
In the cultural heritage domain,
institutions that manage collections, such as museums or libraries, and
service providers, such as object registrars, user-facing portals or digital archives,
participate in a joint network with the goal of sharing digital metadata and objects effectively.
Institutions maintain a [=Data Pod=] as primary exchange hub for metadata about their collections.
End-user portals aims to collect thematic subsets from this selection of pods.
In between, there is a layer of facilitating services that
- make data findable for portals and other applications, such as indexing metadata for search or disseminating collection information to other network members;
- provide value-added services to institutions like enriching metadata by adding links to other collections, or doing digital preservation.
Institutions use notifications to request services from the designated [=Service Node=], eg. they can `Offer` a dataset to a enrichment service.
In addition, notifications are used by all parties to inform each other about relevant changes.
A [=Service Node=] can respond with, for example, an accept or reject of the request for digital preservation, or simply take note.
The result of a service, ie. the object is preserved digitally for the long term, is a new piece of metadata that augments the object's lifecycle.
This event can be communicated with a new notification to the institution that made the requested or other interested [=agents=] in the network.
Driven by a mix of policy derived from institutional and cooperation requirements,
institutions follow processes that dictate when to request certain services, contact other [=agents=] and in what order.
These processes can be declared in a [=rulebook=] and executed in an automated manner by an [=Orchestrator=].
Services that help the chasm between the [=Data Pod=] and data consuming portals, can maintain a [=Data Pod=] of their own,
for storing and exposing derivatives to upper layers in the network.
For instance,
a [=Service Node=] that collects datasets from institution's [=Data Pods=] to generate dataset summaries and enrichments can store these results.
In turn, an end-user portal can use the data of this [=Service Node=] to find [=Data Pods=] with relevant data.
Acknowledgement {#acknowledgement}
============================================================
We thank Herbert Van de Sompel, [DANS + Ghent University](https://dans.knaw.nl/nl/), hvdsomp@gmail.com
for the valuable input during this project.
<pre class=biblio>
{
"solid-protocol": {
"authors": [
"Sarven Capadisli",
"Tim Berners-Lee",
"Ruben Verborgh",
"Kjetil Kjernsmo"
],
"href": "https://solidproject.org/TR/protocol",
"title": "The Solid Protocol",
"status": "Published",
"publisher": "Solid project",
"deliveredBy": [
"https://www.w3.org/community/solid/"
]
},
"webid": {
"authors": [
"Andrei Sambra",
"Henry Story",
"Tim Berners-Lee"
],
"href": "https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html",
"title": "WebID 1.0",
"status": "Editor’s Draft",
"publisher": "WebID Community Group",
"deliveredBy": [
"https://www.w3.org/community/webid/"
]
},
"solid-oidc": {
"authors": [
"Aaron Coburn (Inrupt)",
"elf Pavlik",
"Dmitri Zagidulin"
],
"href": "https://solid.github.io/authentication-panel/solid-oidc/",
"title": "SOLID-OIDC",
"status": "Editor’s Draft",
"publisher": "Solid project",
"deliveredBy": [
"https://www.w3.org/community/solid/"
]
},
"solid-tr": {
"href": "https://solidproject.org/TR/",
"title": "Solid Technical Reports",
"publisher": "Solid project",
"deliveredBy": [
"https://www.w3.org/community/solid/"
]
}
}
</pre>