Decision record describing the Azure blob storage transfer end-to-end flow between two participants.
This document describes the flow if the 2 participants are using Azure blob storage.
- A Provider connector that makes an asset available, and executes data transfer through its DPF service.
- A Consumer connector that requests the asset, and provides a destination Azure storage container.
The data-plane-azure-storage
extension can be used on DPF to support blob transfers.
A client can trigger a blob transfer on the consumer side via the Data Management API.
The client needs to use the managedResources=true
option in its API request. This option will make sure that the resources needed for the transfer (storage account) are created. managedResources=false
option for Azure storage transfer is not supported yet. managedResources=false
would be used if the client wants to use a pre-existing container without creating a new one.
If something goes wrong during the transfer, the Consumer would not be aware that an error occurred, it would just never see the transferProcess with completed state when polling for the result. There is an EDC issue to address this problem.
Storage accounts access key should be stored in respective Vaults (e.g. Azure Key Vault) before initiating transfers. The consumer provisions a new storage container and generates a write-only SAS token to allow the provider to write data to the consumer's container.
The following sequence diagram describes the flow to transfer a blob from a provider storage account to a consumer storage account. As a prerequisite, contract negotiation must have been performed.
This sequence diagram describes the flow if the 2 participants are using Azure blob storage. If one of the 2 participant is using another type of storage, only half of the sequence diagram would reflect the reality.
The sequence starts from the client triggering the transfer on the consumer side and finishes when the consumer deprovisions its resources.
- The client calls the data management API to trigger a transfer process. The requested asset is identified by the
assetId
and thecontractId
from previous contract negotiation. The client get thePROCESS_ID
corresponding to thetransferProcess
. ThisPROCESS_ID
will be used to get the transfer status. For now,managedResources
needs to be set to true, to make sure that the consumer provisions the blob container.managedResources=false
would be used if the client wants to use a pre-existing container without creating a new one, but this feature is not supported yet. - Consumer gets the destination storage account access key in its Vault.
- Consumer creates a container where the Provider DPF may write blobs. The container is created only if the client specifies
managedResources=true
. The ObjectStorageProvisioner is responsible for provisioning the container and for generating a SAS token to access the container. To generate a SAS token, a storage account key is needed. This storage account key should be stored and retrieved in the Consumer Vault. - Consumer stores the SAS token in its Vault.
- Consumer sends an IDS message to the Provider, containing the information needed to transfer data to the destination container, including the asset id, the destination blob account and container name and the SAS token needed to write a blob to the container.
- Provider stores the SAS token in its Vault (rather than in the request to DPF, so that the latter may be persisted in a future implementation without containing secrets).
- Provider initiates the blob transfer on the Provider DPF. The Provider DPF can be embedded or run in a separated runtime. If it runs on a separated runtime, the Provider's control plane initiates the transfer via an HTTP request.
- The Provider DPF gets the source storage account access key in the Provider Vault.
- The Provider DPF gets the SAS token needed to write the blob to the consumer blob container.
- The Provider DPF reads the data that needs to be transfered using an AzureStorageDataSource.
- The Provider DPF writes the data to the destination blob so that the consumer can access the data using an AzureStorageDataSink.
- When the transfer is finished, the Provider DPF writes a blob called
.complete
to signal the completion. - In the meantime, the consumer regularly checks if a blob named
.complete
exists in the container. Clients can poll the state using the/transferprocess/<PROCESS_ID>/state
endpoint. When the.complete
is found, the consumer persists the new transferProcess state. - The clients polls the transferProcess state.
- When the transfer is finished, the client can read the blob.
- Then, the client can call the Data Management API to deprovision the transfer process by the following two steps.
- (17) Consumer deletes the container containing the blob. The ObjectStorageProvisioner is responsible for deprovisioning the container.
- (18) Consumer deletes the SAS token in the Vault. The TransferProcessManagerImpl is responsible for deprovisioning the SAS token.