Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PART 1] Document Encrypted Storage System #33

Merged
merged 16 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions compute/marketplace/encrypted-data-vault.mdx

This file was deleted.

21 changes: 14 additions & 7 deletions mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -141,17 +141,24 @@
}
]
},
{
"group": "Data Storage",
"pages": ["storage/quickstart"]
},
{
"group": "Overview",
"pages": [
"storage/data-layer",
"storage/data-models",
"storage/data-assets",
"storage/privacy-and-security-standards"
"storage/coordination",
"storage/identity"
]
},
{
"group": "Implementation",
"pages": ["storage/implementation/quickstart"]
},
{
"group": "Node Operators",
"pages": [
"storage/operators/requirements",
"storage/operators/running-a-node",
"storage/operators/maintenance"
]
},
{
Expand Down
99 changes: 99 additions & 0 deletions storage/coordination.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: Coordination Layer
---

The coordination layer serves as the nervous system of Gateway's network, leveraging blockchain technology to maintain network state, manage access control, and store transaction proofs. While currently implemented on Solana for optimal cost and performance, Gateway's protocol remains blockchain-agnostic by design.

## Contract Architecture

Gateway strategically utilizes existing blockchain infrastructure to power up the Encrypted Data Vaults network. Solana currently serves as the anchor chain due to its high throughput and cost efficiency. The coordination layer anchors all critical network operations while maintaining flexibility for future blockchain integrations.

```mermaid
classDiagram
class File {
+Pubkey id
+Pubkey authority
+Optional~Pubkey~ recovery
+String fid
+u64 size
+String checksum
+i64 expires_at
+i64 roles_updated_at
+i64 rules_updated_at
+Optional~u64~ fee
+emit FileChanged()
}

class Role {
+Pubkey file_id
+Pubkey address
+u8 permission_level
+bool can_share
+AddressType address_type
+Optional~i64~ expires_at
+emit RolesChanged()
}

class FileMetadata {
+Pubkey file_id
+Vec~Metadata~ metadata
+emit MetadataUpdated()
}

class AddressType {
<<enumeration>>
Wallet
Collection
DID
}

class ActionType {
<<enumeration>>
View
Update
Delete
}

File "1" *-- "many" Role : has
File "1" *-- "1" FileMetadata : has
Role *-- AddressType : uses
Role *-- ActionType : uses permissions
```

## Smart Contract Architecture

The coordination layer consists of three primary smart contracts that work together to maintain system state:

### File Registry Contract
The core contract maintains essential file information including identifiers, ownership, and integrity data. Each file record contains a unique identifier, authority address, size, and checksum. The system supports optional recovery addresses and expiration timestamps, enabling comprehensive lifecycle management.

### Role Management Contract
This contract implements Gateway's access control system through granular permission assignments. The permission structure supports three levels (View, Update, Delete) with additional sharing capabilities. Each role assignment can be time-limited through optional expiration timestamps and supports multiple address types including wallets, collections, and DIDs.

### Metadata Contract
Supporting arbitrary metadata storage through key-value pairs, this contract enables flexible data attribution while maintaining efficient querying capabilities. The system tracks all metadata changes through timestamped events, providing a complete audit trail of modifications.

## Event System

The coordination layer implements a comprehensive event system for state change notification:

1. FileChanged: Tracks modifications to core file attributes
2. RolesChanged: Records updates to access control assignments
3. MetadataUpdated: Captures changes to file metadata

These events enable efficient system monitoring and synchronization across the network.

## Network State Management

All network coordination data persists on-chain through smart contracts, providing a transparent and verifiable record of system state. This approach ensures:

1. Immutable record of all state changes
2. Verifiable access control decisions
3. Transparent file management operations
4. Auditable metadata modifications

## Integration Considerations

When interacting with the coordination layer, applications should implement proper transaction verification and error handling. The system requires careful management of gas costs and transaction limits while maintaining proper sequence of operations for state changes.

<Info>If you're still exploring Encrypted Data Vaults to understand if they are right for you, please use the API experience.</Info>
102 changes: 0 additions & 102 deletions storage/data-assets.mdx

This file was deleted.

132 changes: 127 additions & 5 deletions storage/data-layer.mdx
Original file line number Diff line number Diff line change
@@ -1,12 +1,134 @@
---
title: Data Layer
description: Gateway Protocol Data Layer
---

![Gateway Protocol Data Layer](/assets/protocol-architecture/data-layer.png)
Gateway Protocol's storage layer implements a distributed, privacy-preserving infrastructure that ensures data security and availability while minimizing storage overhead. The system combines erasure coding, zero-knowledge proofs, and proxy re-encryption to create a secure and efficient storage network.

## Encrypted Data Vaults
## System Architecture

Encrypted Data Vaults, or EDVs, are the primary entities responsible for storing and maintaining encrypted data, allowing for efficient data management without overburdening the main network. They interact closely with the Challenge Protocol to ensure compliance with the network's requirements, and their stakes can be slashed if they fail to meet these requirements.
The Gateway storage architecture operates across three distinct layers that work in concert to provide secure and efficient data storage services.

EDVs have various storage metrics, including capacity, total staked capacity, free capacity, purchased capacity, and used storage. Upon registration, an EDV broadcasts its total storage capacity to the validators and stakes a sufficient quantity of tokens to commence network services. EDVs pledge available storage at the beginning of each epoch and are bound by the protocol to fulfill their storage agreements.
```mermaid
graph TD
subgraph Client Layer
A[Client] -->|Encrypted Data| B[Gateway API]
end

subgraph Coordination Layer
B --> C[Metadata Nodes]
B --> D[Blockchain Anchor]
C -->|Storage Orchestration| E[EDV Controller]
end

subgraph Storage Layer
E -->|Distribute Shards| F[EDV Node 1]
E -->|Distribute Shards| G[EDV Node 2]
E -->|Distribute Shards| H[EDV Node N]
end
```

The Client Layer manages all user-facing operations. It handles initial data encryption, user authentication, and provides the primary interface through the Gateway API. All cryptographic operations necessary for ensuring end-to-end security occur at this layer before data transmission.

The Coordination Layer serves as the network's central nervous system. Through metadata management and blockchain anchoring, it orchestrates storage operations and maintains the network's state. This layer validates storage proofs and coordinates data sharing operations while ensuring all access controls are properly enforced.

The Storage Layer consists of distributed Encrypted Data Vaults (EDVs) that form the backbone of the storage infrastructure. These nodes work together to store encrypted data shards, execute proxy re-encryption operations, and maintain proofs of storage.

## Core Components

### Encrypted Data Vaults (EDVs)

EDVs serve as the fundamental storage units within Gateway's infrastructure. Each vault operates as an isolated environment for encrypted data shards, capable of executing proxy re-encryption operations for secure data sharing. Through continuous proof generation and maintenance, EDVs ensure data integrity while participating in recovery operations when needed.

### Erasure Coding System

Gateway's implementation of Reed-Solomon erasure coding represents a significant advancement in distributed storage efficiency. The system employs a 10:4 parity ratio, creating 14 total shards for each stored file. This carefully chosen ratio provides optimal balance between redundancy and storage efficiency.

#### Distribution Strategy

The erasure coding process first divides incoming data into 10 equal chunks. The system then generates 4 parity shards using Reed-Solomon algorithms. These 14 resultant shards are distributed across geographically diverse EDV nodes based on factors including:

- Network latency and bandwidth capacity
- Current storage utilization
- Geographic distribution for regulatory compliance
- Historical node reliability metrics

#### Recovery Process

During file recovery, the system only needs any 10 shards from the total 14 to reconstruct the original file. This approach provides significant advantages over traditional replication:

- Faster recovery times through parallel shard retrieval
- Lower network bandwidth requirements
- Improved resistance to geographic network outages
- Reduced storage costs while maintaining reliability

### Proxy Re-encryption (PRE)

<Note>You can learn more about our implementation of PRE here: [Overview](/compute/marketplace/pre/overview).</Note>

The PRE system employs a non-interactive key generation protocol. When a data owner initiates sharing, the system generates re-encryption keys without requiring direct communication between parties. This process preserves the confidentiality of both the owner's and recipient's private keys.

#### Transform Key Security

Re-encryption keys are generated with carefully controlled scope. Each key:

- Can only re-encrypt specific data shards
- Has configurable time-based validity
- Cannot be used to decrypt the original data
- Cannot be combined with other keys to escalate privileges

#### Shard-Level Operations

Re-encryption occurs independently at each EDV storing relevant shards. This distributed approach provides several advantages:

- No single point of system compromise
- Parallel processing reduces latency
- Network bandwidth optimization

The combination of erasure coding and PRE creates unique security properties. Even if an attacker compromises multiple EDV nodes, they cannot:

- Reconstruct the original file without sufficient shards
- Access plaintext data without appropriate decryption keys
- Generate valid re-encryption keys without proper authorization
- Bypass the access control system through direct shard access

### Proof of Storage

The proof of storage system utilizes ZK-SNARKs to verify data integrity without exposing the underlying information. Each EDV maintains chunk hashes and constructs merkle trees to guarantee data hasn't been tampered with. The system batches proofs for efficient verification and anchors these commitments to the blockchain for immutable record-keeping.

## Data Flows

Data flows through the system follow two primary patterns: storage and retrieval.

During storage, clients first encrypt their data locally. The system then applies erasure coding to generate shards, which EDVs store securely. Nodes generate storage proofs, while the coordination layer anchors commitments and metadata nodes record shard locations.

For retrieval, the process begins with client authentication and access verification. The coordination layer locates the necessary shards, which EDVs then provide for reconstruction. Finally, the client performs decryption to access the original data.

## Security Architecture

Gateway's security architecture rests on three pillars: data privacy, high availability, and access control.

Data privacy is ensured through end-to-end encryption, zero-knowledge storage proofs, and secure proxy re-encryption. The system maintains confidentiality at every step, from initial storage through sharing and retrieval.

High availability comes from the distributed nature of shard storage combined with erasure coding redundancy. The system can automatically recover from node failures and maintains geographic distribution to ensure resilience.

Access control is enforced cryptographically, with granular permissions managed through secure sharing via PRE. All access patterns are auditable, and the system supports time-based access policies.

## Network Coordination

The coordination layer leverages blockchain technology to store proof commitments and record access control conditions. By anchoring these records on Solana, Gateway ensures transparent verification while maintaining cross-chain compatibility.

Metadata management occurs through a dedicated subsystem that tracks shard locations, manages access policies, and coordinates recovery operations. This system orchestrates data sharing and maintains comprehensive proof records.

## Best Practices

For optimal system operation, operators should focus on regular proof validation and monitoring of shard distribution. Geographic redundancy and performance optimization should guide resource allocation decisions.

Security considerations must include robust key management procedures and carefully designed access policies. Regular audit logging and compliance checks ensure the system maintains its security guarantees.

Performance optimization relies on smart shard distribution strategies and efficient proof batching. The network topology should be designed to minimize latency while maintaining proper load balancing across nodes.

## Integration Guidelines

Gateway's storage system integrates seamlessly with computation features, enabling secure data access for processing within our PET Marketplace. The system supports privacy-preserving processing while maintaining efficient data handling and secure result storage.

The platform offers broad cross-platform support through multi-chain compatibility and comprehensive API integration. Standard compliance and protocol interoperability ensure smooth integration with existing systems.
Loading