-
Notifications
You must be signed in to change notification settings - Fork 1
/
preservation-workflow.html
590 lines (552 loc) · 23.4 KB
/
preservation-workflow.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
<!DOCTYPE html>
<html>
<head>
<meta charset='utf-8'>
<title>OTM Preservation Workflow</title>
<script
src='https://www.w3.org/Tools/respec/respec-w3c'
class='remove'></script>
<script class='remove'>
var respecConfig = {
specStatus: "base",
editors: [{
name: "Mike Ritter",
company: "Chronopolis",
companyURL: "http://libraries.ucsd.edu/chronopolis/"
},{
name: "Sibyl Schaefer",
company: "University of California, San Diego",
companyURL: "https://ucsd.edu"
}],
gitHub: "ucsdlib/otm-specs",
shortName: "preservation-workflow-appendix",
wg: "One to Many Working Group",
wgURI: "https://wiki.lyrasis.org/display/OTM",
edDraftURI: "https://github.com/ucsdlib/otm-specs/blob/master/preservation-workflow.html",
maxTocLevel: 3,
localBiblio: {
"APTrust-Volume-Service": {
title: "APTrust Volume Service" ,
href: "https://github.com/APTrust/exchange/blob/master/service/volume_service.go"
},
"OCFL-Evaluation": {
title: "Chronopolis OCFL Evaluation" ,
href: "https://wiki.lyrasis.org/display/OTM/Chronopolis+OCFL+Evaluation"
}
},
otherLinks: [{
key: "OTM Specifications",
data: [{
value: "One to Many (OTM) Specifications Overview",
href: "."
},
{
value: "OTM Bridge API Specification",
href: "otm-bridge.html"
},
{
value: "OTM Gateway API Specification",
href: "otm-gateway.html"
},
{
value: "OTM Appendix - Audit Events",
href: "audit-appendix.html"
},
{
value: "OTM Appendix - Hyrax Workflow",
href: "hyrax-workflow.html"
}]
}]
};
</script>
<link rel="stylesheet" href="otm-styles.css">
</head>
<body>
<p class='copyright'>This document is licensed under a
<a class='subfoot' href='https://creativecommons.org/licenses/by/4.0/' rel='license'>
Creative Commons Attribution 4.0 License
</a>.
</p>
<section id='abstract'>
<p>
The One To Many (OTM) Specification defines two APIs to support communication between digital content repository
systems (Repository) and distributed digital preservation systems (DDP). These APIs work in tandem to allow content
captured in Repository systems to be copied to DDP systems for preservation. The APIs defined are the <a
href="otm-gateway.html">OTM Repository Gateway API (Gateway)</a> for the Repository and the <a
href="otm-bridge.html">OTM Bridge API (Bridge)</a> for the DDP. The Gateway and the Bridge APIs handle intermediary
communication between the Repository and DDP and allow each system to operate without any knowledge of the internals of
the other system. Each API is designed to facilitate deployment either as part of or extension to the Repository (in
the case of the Gateway) or the DDP (in the case of the Bridge) or as a stand-alone application. They each provide an
HTTP-based approach for authentication, communication, and data transfer.
</p>
<p>
<img src="images/system-components.png" width="100%">
</p>
<p>
The descriptions and diagrams below reference the <a href="otm-gateway.html">OTM Repository Gateway Specification</a>
and <a href="otm-bridge.html">OTM Bridge API Specification</a> and are intended to capture the context in which the API
calls are expected to be used.
</p>
<p>
The DDP workflow sections below assume that an intermediary service will be used to communicate with the OTM Bridge and
transform data provided by the OTM Bridge into a format acceptable to the DDP. The implementation of this piece will be
dependent on the architecture and capabilities of the DDP. Allowing this service to remain separate from the Bridge
ensures that the Bridge implementation is able to support a wide variety of DDPs.
</p>
<p>
The primary purpose of the systems and integrations described by the OTM Specifications is to support the deposit and
recovery of content. Content is to be considered recoverable only after it has completed a successful deposit into the
DDP. Content that has been deposited from a Repository into a DDP is intended to be recoverable even if all other OTM
system components have failed. There are no guarantees of recoverability if content has not first completed a
successful deposit.
</p>
<p>Additional notes are provided below for the minimum required steps for implementation in Chronopolis.</p>
</section>
<section>
<h2>Status of This Document</h2>
<p>This document is an overview to a specification, created as part of the One to Many grant, funded by the Andrew W.
Mellon Foundation.</p>
</section>
<section>
<h2>Initialize</h2>
<p>The initialization operation allows a DDP and Repository to connect their respective OTM Bridge and OTM Gateway
applications so that data can be transferred between the two systems.</p>
<section>
<h3>Flow</h3>
<ol>
<li>
An agreement is reached between a repository owner and DDP system that will allow repository content to be deposited
into the DDP; appropriate SLA/MOU and other legal documentation is signed and arrangements for billing/invoicing are
made
</li>
<li>
The DDP administrator calls the Bridge <a href="otm-bridge.html#add-account">Add Account</a> endpoint to add the
repository to the Bridge system and generate the credentials needed for the repository's Gateway to connect to the
Bridge
</li>
<li>The DDP administrator provides the Bridge credentials to the Gateway administrator</li>
<li>
The Gateway administrator enters the Bridge credentials into the Gateway and the Gateway calls the Bridge
<a href="otm-bridge.html#register">Register</a> endpoint to provide the Bridge with the details necessary to make
calls back to the Gateway
</li>
</ol>
</section>
</section>
<section>
<h2>Deposit</h2>
<p>The Deposit workflow describes the process when an OTM Gateway requests that a filegroup be preserved.</p>
<p>As part of this workflow, a version identifier is passed from the OTM Gateway through the system so that a deposit can
be related to a point in time. It is up to the DDP to determine how to store this information in a manner suited for long
term preservation.</p>
<section>
<h3>System to System Flow</h3>
<ol>
<li>The Repository administrator selects a set of objects to be deposited</li>
<li>
The Repository calls the Gateway <a href="otm-gateway.html#deposit-object">PUT Object</a> endpoint once for each
object to be deposited; this starts the deposit process
</li>
<li>
The Gateway resolves each object into a set of files to be deposited; each file is either copied to the Gateway
staging storage area or a link to the file is captured to allow transfer to the Bridge
</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#deposit-content">Deposit Content</a> endpoint using the
object ID as the filegroup identifier and providing an identifier for each file to be deposited
</li>
<li>The Bridge initiates a deposit action for each filegroup in the deposit request</li>
<li>
For each file in each filegroup the Bridge calls the Gateway <a href="otm-gateway.html#transfer-file">GET File</a>
endpoint to transfer the file to the Bridge staging storage location
</li>
<li>
As each file transfer into the Bridge staging storage completes, the Bridge compares the checksum of the
transferred file to the checksum provided in the deposit request; any mismatches trigger a re-transfer
</li>
<li>
Once all files in a filegroup are in Bridge staging storage and all checksums are validated, the status of the
deposit is updated to <a href="otm-bridge.html#deposit-status">`DEPOSIT_STAGED`</a>
</li>
<li>
The DDP calls the Bridge <a href="otm-bridge.html#list-deposits">List Deposits</a> endpoint on a regular schedule
to check for new deposits in the <a href="otm-bridge.html#deposit-status">`DEPOSIT_STAGED`</a> state
</li>
<li>
For each staged deposit in the Bridge the DDP copies the files from Bridge staging storage into the DDP ingest
pipeline and performs a deposit (and replication)
</li>
<li>
When the deposit into the DDP is finished, the DDP calls the Bridge <a href="otm-bridge.html#complete-deposit">
Complete Deposit</a> endpoint to inform the Bridge that the deposit is complete
</li>
<li>The Bridge clears the files associated with the completed deposit from Bridge staging storage and transitions the
deposit into a completed status</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#get-deposit-status">Get Deposit Status</a> endpoint in order
to provide the Respository administrator with deposit status information
</li>
</ol>
<figure id="system-deposit">
<figcaption>Deposit Workflow</figcaption>
<img src="images/deposit_content.png"/>
</figure>
</section>
<section>
<h3>DDP Workflow</h3>
<p>The DDP portion of the Deposit workflow assumes that the OTM Bridge has already performed initial processing in
order to prepare the filegroup for ingestion into a DDP.</p>
<h4>Chronopolis Implementation Notes</h4>
<p>New Deposit</p>
<ol>
<li>
Query the OTM Bridge for incoming deposits
<ul>
<li><a href="otm-bridge.html#list-deposits">GET /deposit</a></li>
</ul>
</li>
<li>
Extract `filegroup-id` set from deposits response
</li>
<li>
Look for the Deposit based on the `filegroup-id` in Chronopolis
<ul>
<li>Expect there to be no existing object in the DDP with an identifier of `filegroup-id`</li>
</ul>
</li>
<li>
Ensure space exists in Chronopolis for this Deposit
<ul>
<li>If reserving space, an API needs to be built similar to the [[APTrust-Volume-Service]]</li>
<li>Otherwise wait for staging filesystem to have available space</li>
</ul>
</li>
<li>
Package the Deposit for Chronopolis
<ul>
<li>Create an OCFL Object</li>
<li>Create ACE Tokens</li>
</ul>
</li>
<li>Use Chronopolis Ingest API to distribute the Deposit</li>
<li>
Query Chronopolis Ingest API for Deposit status
<ul>
<li>
Inform the OTM Bridge using <a href="otm-bridge.html#complete-deposit">POST /deposit/{filegroup-id}</a> when
the Deposit has fully distributed throughout Chronopolis
</li>
<li>Create an <a href="audit-appendix.html#ingest">Ingest</a> audit event for the deposit</li>
</ul>
</li>
</ol>
<p>Updated Deposit</p>
<ol>
<li>
Query the OTM Bridge for incoming deposits
<ul>
<li><a href="otm-bridge.html#list-deposits">GET /deposit</a></li>
</ul>
</li>
<li>
Extract `filegroup-id` set from deposits response
</li>
<li>
Look for the Deposit based on the `filegroup-id` in Chronopolis
<ul>
<li>Expected to find existing content based on this flow</li>
</ul>
</li>
<li>
Request to have the Deposit re-staged
<ul>
<li>Should only need OCFL metadata (inventory.json)</li>
<li>This could also handle storage reservation</li>
</ul>
</li>
<li>
Using the json response from the OTM Bridge:
<ul>
<li>
Mint a new version using the `version-id` from the OTM Bridge
<ul>
<li>The OCFL Object version will increment as usual</li>
<li>Store the version from the OTM Gateway in the `inventory.json`</li>
</ul>
</li>
<li>New files will be added to the OCFL inventory and distributed in full</li>
<li>Existing files referenced in the OCFL inventory that have not been modified will not be redistributed</li>
<li>Files omitted from the json response will not be included in the current version</li>
<li>Create additional ACE Tokens for OCFL payload files</li>
</ul>
</li>
<li>Use Chronopolis Ingest API to distribute the updated Deposit</li>
<li>
Query Chronopolis Ingest API for Deposit status
<ul>
<li>
Inform the OTM Bridge using <a href="otm-bridge.html#complete-deposit">POST /deposit/{filegroup-id}</a> when
the Deposit has fully distributed throughout Chronopolis
</li>
<li>Create an <a href="audit-appendix.html#ingest">Ingest</a> audit event for the deposit</li>
</ul>
</li>
</ol>
<ul class="note">
<li>An additional [[OCFL-Evaluation]] has been done with implementation notes for integration in Chronopolis</li>
</ul>
</p>
<figure id="ddp-deposit">
<figcaption>DDP Deposit Workflow</figcaption>
<img src="images/ddp_deposit.png"/>
</figure>
</section>
</section>
<section>
<h2>Audit</h2>
<p>The Audit workflow retrieves actions taken on filegroups that have been deposited. It is expected that the Audit Log
will be maintained by the OTM Bridge and will contain events provided by the DDP.</p>
<p>See also: <a href="audit-appendix.html">Audit Event Appendix</a></p>
<section>
<h3>Flow</h3>
<ol>
<li>The Repository manager selects an object and requests a preservation audit history</li>
<li>
The Repository calls the Gateway <a href="otm-gateway.html#get-object-audit">GET Object Audit</a> endpoint for the
object
</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#get-audit-log">Get Audit Log</a> endpoint, specifying the object
ID as the filegroup identifier
</li>
<li>
The Bridge gathers audit data for the given filegroup and associated files from its internal data store and responds to
Gateway with the requested audit history data
</li>
<li>
The Gateway translates the Bridge audit data into a format familiar to the repository and responds to the Repository
request
</li>
<li>The Repository displays the audit data to the Repository manager</li>
</ol>
<figure id="audit-diagram">
<figcaption>Get Audit Workflow</figcaption>
<img src="images/audit_content.png">
</figure>
</section>
</section>
<section>
<h2>Restore</h2>
<p>The Restore workflow handles returning data back to a Repository which had been previously deposited.</p>
<section>
<h3>System to System Flow</h3>
<ol>
<li>The Repository manager selects an object to be restored from preservation storage</li>
<li>
The Repository calls the Gateway <a href="otm-gateway.html#initiate-restore">POST Object Restore</a> endpoint for
the object to be restored
</li>
<li>The Gateway calls the Bridge <a href="otm-bridge.html#get-content-details">Get Content Details</a> in order to
resolve the set of files to be restored</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#restore-content">Restore Content</a> endpoint with the list of
files to be restored
</li>
<li>
The Bridge initiates a restore action for all files in the restore request and creates a directory in Bridge staging
storage for the restored files
</li>
<li>
The DDP calls the Bridge <a href="otm-bridge.html#list-restores">List Restores</a> endpoint on a regular schedule to
check for new restore requests
</li>
<li>The DDP copies each file in the restore request to the specified directory in Bridge staging storage</li>
<li>
When all files have been copied into Bridge staging storage the DDP calls the Bridge
<a href="otm-bridge.html#complete-restore">Complete Restore</a> endpoint to inform the Bridge that the restored files
are available
</li>
<li>
The Bridge validates that all file checksums match the checksums provided in the restore request (when checksums are
provided)
</li>
<li>
The Bridge updates the status of the restore action to <a href="otm-bridge.html#restore-status">`RESTORE_STAGED`</a>
</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#get-restore-status">Restore Status</a> endpoint on a regular
basis to determine if the status of the restore is <a href="otm-bridge.html#restore-status">`RESTORE_STAGED`</a>
</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#get-restored-content">Get Restored Content</a> endpoint for
each file in the restore request and stores each file in the Gateway staging storage
</li>
<li>
The Repository calls the Gateway <a href="otm-gateway.html#retrieve-object">Get Object</a> endpoint and pulls the
content into repository storage
</li>
<li>The Repository sends a notification to the Repository manager that requested the restore</li>
</ol>
<figure alt="system-restore">
<figcaption>Restore workflow</figcaption>
<img src="images/restore_content.png"/>
</figure>
</section>
<section>
<h3>DDP Workflow</h3>
<h4>Chronopolis Implementation Notes</h4>
<ol>
<li>
Query the OTM Bridge for Restores
<ul>
<li><a href="otm-bridge.html#list-restores">GET /restores</a></li>
<li>Validate data to restore exists in Chronopolis</li>
</ul>
</li>
<li>
Identify space for the restore to be staged on
<ul>
<li>Needs to be accessible by the OTM Gateway</li>
<li>On insufficient space await or reject</li>
<li>Need guarantees about available space while re-staging data</li>
</ul>
</li>
<li>
Restage data
<ul>
<li>OTM With RO Mount: Create symbolic links to pull from</li>
<li>OTM Without RO Mount: Need process for retrieving data from Chronopolis nodes</li>
</ul>
</li>
<li>
Notify the OTM Bridge that the Restore is staged and accessible with a given TTL
<ul>
<li>Call OTM Bridge indicating the restore is ready: <a href="otm-bridge.html#complete-restore">
POST /restore/{restoreId}</a></li>
<li>Send <a href="audit-appendix.html#restoration">Restoration</a> audit event to OTM Bridge</li>
</ul>
</li>
<li>
Upon expiration of the TTL
<ul>
<li>Remove staged content</li>
</ul>
</li>
</ol>
<figure id="ddp-restore">
<figcaption>DDP Restore Workflow</figcaption>
<img src="images/ddp_restore.png"/>
</figure>
</section>
</section>
<section>
<h2>Delete</h2>
<p>The Delete workflow handles removing data from a DDP. It is assumed that this will be an operation which is
non-recoverable and permanently removes data from a DDP. Deletes are expected when preserved content is discovered to be
subject to legal or administrative restrictions that require its removal. It is recommened that repositories restrict the
ability to delete content. </p>
<section>
<h3>System to System Flow</h3>
<ol>
<li>The Repository manager selects an object to be deleted from preservation storage</li>
<li>
The Repository calls the Gateway <a href="otm-gateway.html#purge-object">Purge Object</a> endpoint for the object
or version to be deleted
</li>
<li>The Gateway calls the Bridge <a href="otm-bridge.html#get-content-details">Get Content Details</a> endpoint and
resolves the object into a set of files to be deleted</li>
<li>
The Gateway calls the Bridge <a href="otm-bridge.html#delete-content">Delete Content</a> endpoint with the list of
files to be deleted
</li>
<li>The Bridge initiates a delete action for all files in the delete request</li>
<li>
The DDP calls the Bridge <a href="otm-bridge.html#list-deletes">List Deletes</a> endpoint on a regular schedule to
check for new delete requests
</li>
<li>
The DDP performs a delete on each requested file; when all deletes are completed, the DDP calls the Bridge
<a href="otm-bridge.html#complete-delete">Complete Delete</a> endpoint to inform the Bridge that the delete is
complete
</li>
<li>
The Repository administrator checks the object status in the Repository; the Repository requests information about
the object from the Gateway to provide information.
</li>
</ol>
<figure id="system-delete">
<figcaption>Delete Workflow</figcaption>
<img src="images/delete_content.png"/>
</figure>
</section>
<section>
<h3>DDP Workflow</h3>
<h4>Chronopolis Implementation Notes</h4>
<p>Delete filegroup</p>
<ol>
<li>
Query the OTM Bridge for deletions
<ul>
<li><a href="otm-bridge.html#list-deletes">GET /delete</a></li>
</ul>
</li>
<li>Identify the Chronopolis package associated with the `filegroup-id`</li>
<li>
Removing package
<ul>
<li>Start the Chronopolis deprecation workflow</li>
</ul>
</li>
<li>
When Chronopolis package is deprecated, update the status of the OTM Bridge
<ul>
<li><a href="otm-bridge.html#complete-delete">POST /delete/{deleteId}</a></li>
<li>Send <a href="audit-appendix.html#deletion">Deletion</a> audit event to OTM Bridge</li>
</ul>
</li>
</ol>
<p>Delete files</p>
<ol>
<li>
Query the OTM Bridge for deletions
<ul>
<li><a href="otm-bridge.html#list-deletes">GET /delete</a></li>
</ul>
</li>
<li>
Query Chronopolis for the `filegroup-id`
<ul>
<li>Identify the Chronopolis package associated with the `filegroup-id`</li>
<li>Identify the files within the package to remove</li>
</ul>
</li>
<li>
Removing files/version
<ul>
<li>Follow OCFL recommendations for deletion</li>
<li>Tombstoning may be necessary</li>
</ul>
</li>
<li>
Push package changes throughout Chronopolis
<ul>
<li>If this is considered a new package, treat it like a new deposit</li>
<li>Otherwise need workflow to cover overwriting files in a package</li>
</ul>
</li>
<li>
When propagation of deletes is complete, update the status of the OTM Bridge
<ul>
<li><a href="otm-bridge.html#complete-delete">POST /delete/{deleteId}</a></li>
<li>Send <a href="audit-appendix.html#deletion">Deletion</a> audit event to OTM Bridge</li>
</ul>
</li>
</ol>
<figure id="ddp-delete">
<figcaption>DDP Delete Workflow</figcaption>
<img src="images/ddp_delete.png"/>
</figure>
</section>
</section>
</body>
</html>