The Infrastructure Review

Distributed Systems · Cryptography · Architecture

The Decentralized
GPU Revolution

How fifty thousand nodes across one hundred fifty countries are reshaping computational infrastructure through Byzantine consensus, military-grade encryption, and community governance.

In the depths of data centers scattered across continents, a quiet revolution unfolds. Traditional cloud computing, with its centralized authorities and surveillance apparatuses, faces an existential challenge from an unlikely alliance of cryptographers, engineers, and idealists. Their weapon of choice: a distributed network of graphics processing units, bound together not by corporate decree but by mathematical consensus.

The numbers tell a compelling story. Fifty thousand nodes pulse with computational life, their collective power exceeding that of many nation-states' entire technological infrastructure. Yet this is not merely about raw processing power. It represents a fundamental reimagining of how we approach distributed computing A paradigm where computational tasks are divided among multiple machines, coordinating through network protocols. , privacy, and digital sovereignty in an age of unprecedented surveillance.

At the heart of this transformation lies a trinity of innovations: a privacy layer that would make cypherpunks weep with joy, a marketplace that democratizes access to machine learning pipelines, and a software development kit so elegantly simple that it borders on poetry. Each component represents years of research distilled into practical tools that challenge the status quo.

"We're not just building infrastructure. We're crafting a new social contract for the age of artificial intelligence."
Network Architect

The genesis of this platform can be traced to a simple observation: the gatekeepers of computational power have become too powerful. Amazon Web Services, Google Cloud Platform, Microsoft Azure—these titans control not just the infrastructure but the very terms under which innovation occurs. Their data centers are panopticons, their terms of service are constitutions we never voted for, and their pricing models extract maximum value while providing minimum transparency.

Against this backdrop, the distributed GPU network emerges not as a mere alternative but as a philosophical statement. It asserts that computational resources, like knowledge itself, should be freely accessible to all who seek them. It proclaims that privacy is not a luxury but a fundamental right. It demonstrates that community governance can triumph over corporate hierarchies.

The Privacy Imperative

Eight Services, Four Technologies, Complete Protection

In the pantheon of distributed computing, privacy stands not as an afterthought but as the architectural cornerstone upon which all trust is built. The Bazaar platform implements an unprecedented privacy infrastructure—eight core services orchestrating four fundamental technologies to create what security researchers have called "the most comprehensive privacy-preserving compute platform in production today."

The numbers speak with authority: sub-millisecond differential privacy operations, 100-millisecond zero-knowledge proof generation, Byzantine fault tolerance up to 33% malicious nodes, and triple-encrypted Tor circuits established in under 300 milliseconds. These are not theoretical benchmarks but operational realities, battle-tested across thousands of compute hours and millions of privacy-preserving operations.

"Privacy is not a feature to be added; it is the foundation upon which legitimate distributed computing must be built."
Chief Privacy Architect
CORE ORCHESTRATION

Privacy Controller

The central nervous system of privacy operations, orchestrating all privacy-preserving computations across the network. Built with FastAPI, operating on port 8009.

Language: Python (FastAPI)
Port: 8009
Lines of Code: 354
Differential Privacy: ε=1.0, δ=1e-5
Zero-Knowledge: Groth16 SNARKs
Consensus: HoneyBadgerBFT
BUDGET MANAGEMENT

Privacy Stack

Enforces privacy budget allocation and tracks differential privacy consumption.

Port: 8140
Max Epsilon: 5.0
Refresh: 3600s
Ledger: PostgreSQL
ADVANCED COMPONENTS

Privacy Suite

Components 27-32: Anonymization, optimization, adaptation, and policy engine.

Port: 8141
Components: 6
Adapters: CNN, Transformer
Hot Config: Enabled
BYZANTINE CONSENSUS

Bulletin Board

Immutable message board with HoneyBadgerBFT consensus and Merkle tree verification.

Port: 8008
Algorithm: HoneyBadgerBFT
Fault Tolerance: 33%
Batch Size: 10 messages
KEY MANAGEMENT

Security Vault

Hardware-backed key management with FIPS 140-2 compliance and HSM integration.

Port: 8017
Language: Go
Storage: BadgerDB
HSM: PKCS#11
ANONYMOUS FEDERATION

AnoFel ZKP System

Zero-knowledge proof generation for anonymous federated learning with gradient privacy.

Curve: BN254
Scheme: Groth16
Proof Size: ~200 bytes
Generation: ~100ms
SECURE AGGREGATION

LF3PFL Coordinator

Layer-wise federated privacy with Byzantine-robust gradient aggregation.

Secret Sharing: Shamir
Methods: Mean, Median, Krum
Byzantine Threshold: 33%
Variance Reduction: 75%
ANONYMOUS ROUTING

Tor Network Integration

Triple-encrypted onion routing with hidden services for all Bazaar components.

Guard Nodes: 5 · Middle Nodes: 10 · Exit Nodes: 5
Circuit Lifetime: 600s · Max Circuits: 10
Hidden Services: composer.onion, policy-engine.onion, registry.onion, slo-broker.onion, bulletin-board.onion, privacy-controller.onion

Privacy Request Lifecycle

Every request entering the Bazaar platform undergoes a sophisticated privacy transformation, passing through multiple validation and protection layers before reaching its destination. This lifecycle, measured in milliseconds yet comprehensive in its security guarantees, represents the practical implementation of theoretical privacy primitives at scale.

    ╔═════════════════════════════════════════════════════════════════════╗
    ║                    PRIVACY REQUEST LIFECYCLE                        ║
    ╠═════════════════════════════════════════════════════════════════════╣
    ║                                                                     ║
    ║    ┌────────────────┐                                             ║
    ║    │ Client Request │                                             ║
    ║    └────────┬───────┘                                             ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌────────────────┐     Has Privacy Headers?                    ║
    ║    │  Kong Gateway  │────────────┐                               ║
    ║    │    :8000       │            │                               ║
    ║    └────────┬───────┘            ▼                               ║
    ║             │                [403 Denied]                         ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌────────────────┐                                             ║
    ║    │ Privacy Budget │     Valid ε/δ Budget?                       ║
    ║    │    Plugin      │────────────┐                               ║
    ║    └────────┬───────┘            │                               ║
    ║             │                    ▼                               ║
    ║             │              [403 Exhausted]                        ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌────────────────┐                                             ║
    ║    │ AnoFel Plugin  │     Valid ZK Proof?                        ║
    ║    │  Provenance    │────────────┐                               ║
    ║    └────────┬───────┘            │                               ║
    ║             │                    ▼                               ║
    ║             │              [403 Invalid]                          ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌─────────────────────────────────────┐                       ║
    ║    │      Privacy Controller :8009       │                       ║
    ║    ├─────────────────────────────────────┤                       ║
    ║    │ • Generate ZK Proof (Groth16)       │                       ║
    ║    │ • Add DP Noise (ε=1.0, δ=1e-5)     │                       ║
    ║    │ • Secure Aggregation (MPC)          │                       ║
    ║    │ • Tor Circuit Routing (3 hops)      │                       ║
    ║    └────────┬─────────────────────────────┘                       ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌─────────────────────────────────────┐                       ║
    ║    │     Bulletin Board Consensus        │                       ║
    ║    │        HoneyBadgerBFT               │                       ║
    ║    └────────┬─────────────────────────────┘                       ║
    ║             │                                                      ║
    ║             ▼                                                      ║
    ║    ┌────────────────┐                                             ║
    ║    │Privacy Response│                                             ║
    ║    └────────────────┘                                             ║
    ║                                                                     ║
    ╚═════════════════════════════════════════════════════════════════════╝
                

Gradient Privacy Processing

GRADIENT PRIVACY SEQUENCE
Client          Kong Gateway    Privacy Controller    AnoFel ZKP    LF3PFL         Tor         Bulletin Board
  │                 │                  │                 │            │             │              │
  ├─POST /gradient──►│                  │                 │            │             │              │
  │                 ├─X-Privacy-Epsilon►│                 │            │             │              │
  │                 │ X-Privacy-Grant   │                 │            │             │              │
  │                 │                  │                 │            │             │              │
  │                 ├─Validate Budget──►│                 │            │             │              │
  │                 │  Check ε≥0.1     │                 │            │             │              │
  │                 │                  ├─Generate Proof──►│            │             │              │
  │                 │                  │                 ├─Clip L2────►│             │              │
  │                 │                  │                 │ norm ≤ 1.0 │             │              │
  │                 │                  │                 ├─Gaussian───►│             │              │
  │                 │                  │                 │ Noise σ=0.1│             │              │
  │                 │                  │                 ├─Pedersen───►│             │              │
  │                 │                  │                 │ Commitment │             │              │
  │                 │                  │◄────ZKProof─────┤            │             │              │
  │                 │                  │  BN254/Groth16 │            │             │              │
  │                 │                  │                 │            │             │              │
  │                 │                  ├─Secret Sharing──────────────►│             │              │
  │                 │                  │                              ├─Byzantine──►│              │
  │                 │                  │                              │ Detection   │              │
  │                 │                  │                              │ 33% thresh  │              │
  │                 │                  │◄─────────Aggregated Result──┤             │              │
  │                 │                  │                              │             │              │
  │                 │                  ├─Build Circuit────────────────────────────►│              │
  │                 │                  │                                           ├─3 Hop────────►│
  │                 │                  │                                           │ Onion Routing │
  │                 │                  ├─Post to Bulletin──────────────────────────────────────────►│
  │                 │                  │                                                           ├─Consensus
  │                 │                  │                                                           │ Round
  │                 │                  │◄──────────────────────────Privacy Receipt─────────────────┤
  │◄────Response────┼──────────────────┤                                                           │
                

The gradient privacy processing flow represents the confluence of multiple privacy-preserving technologies working in concert. Each gradient update undergoes clipping to bound its L2 norm, receives carefully calibrated Gaussian noise to ensure differential privacy, and is committed using Pedersen commitments before zero-knowledge proof generation.

Zero-Knowledge Proof Generation

The implementation of Groth16 SNARKs on the BN254 curve represents a masterclass in applied cryptography. With proof sizes of merely 200 bytes and generation times averaging 100 milliseconds, the system achieves the holy grail of zero-knowledge systems: practical efficiency without compromising security.

IMPLEMENTATION: ANOFEL_ZKP.PY
# Differential Privacy Application
gradient_norm = np.linalg.norm(gradient)
if gradient_norm > self.config.clip_norm:
    gradient *= (self.config.clip_norm / gradient_norm)

# Add Gaussian Noise for (ε,δ)-DP
noise_stddev = self.config.noise_multiplier * self.config.clip_norm
noise = np.random.normal(0, noise_stddev, gradient.shape)
private_gradient = gradient + noise

# Generate Pedersen Commitment
r = self._generate_random_field_element()
commitment = multiply(G1, int.from_bytes(
    hashlib.sha256(gradient.tobytes()).digest(), 'big'
) % curve_order)
commitment = add(commitment, multiply(H1, r))

# Construct Groth16 Proof
async def _generate_groth16_proof(self, gradient, commitment):
    # Generate proof elements on BN254 curve
    r = self._generate_random_field_element()
    s = self._generate_random_field_element()
    
    proof_a = multiply(G1, r)  # G1 element
    proof_b = multiply(G2, s)  # G2 element
    proof_c = multiply(G1, (r * s) % curve_order)  # G1 element
    
    return ZKProof(
        proof_a=proof_a,
        proof_b=proof_b,
        proof_c=proof_c,
        commitment=commitment,
        public_inputs=[
            str(self.config.epsilon),
            str(self.config.delta),
            str(np.sum(gradient))
        ]
    )
                

ZERO-KNOWLEDGE PROOF CONSTRUCTION FLOW

    ┌─────────────────┐      ┌─────────────────┐      ┌─────────────────┐
    │ Gradient Input  │─────►│ Differential    │─────►│   Commitment    │
    │                 │      │    Privacy      │      │   Generation    │
    └─────────────────┘      └─────────────────┘      └────────┬────────┘
                                    │                           │
                            ┌───────▼────────┐                  ▼
                            │  Clip L2 Norm  │         ┌─────────────────┐
                            │    norm ≤ 1    │         │ Hash Gradient   │
                            └───────┬────────┘         └────────┬────────┘
                                    │                           │
                            ┌───────▼────────┐                  ▼
                            │ Add Gaussian   │         ┌─────────────────┐
                            │  Noise σ=0.1   │         │Generate Random  │
                            └───────┬────────┘         │  Field Element  │
                                    │                   └────────┬────────┘
                                    │                           │
                                    └──────────┬────────────────┘
                                               │
                                    ┌──────────▼──────────┐
                                    │  Circuit Creation   │
                                    │   Private Witness   │
                                    │   Public Inputs     │
                                    └──────────┬──────────┘
                                               │
                                    ┌──────────▼──────────┐
                                    │   Groth16 Proof    │
                                    │   Generation on    │
                                    │    BN254 Curve     │
                                    └──────────┬──────────┘
                                               │
                            ┌──────────────────┼──────────────────┐
                            ▼                  ▼                  ▼
                    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
                    │  Proof A (G1) │  │  Proof B (G2) │  │  Proof C (G1) │
                    │   ~100 bytes  │  │   ~100 bytes  │  │   ~100 bytes  │
                    └──────────────┘  └──────────────┘  └──────────────┘
                

Byzantine Consensus & Bulletin Board

HoneyBadgerBFT, the Byzantine fault-tolerant consensus protocol at the heart of the bulletin board system, achieves what was once thought impossible: asynchronous Byzantine consensus with optimal communication complexity. With tolerance for up to 33% malicious nodes, the system maintains consistency and availability even under adversarial conditions.

The protocol operates in three distinct phases: reliable broadcast ensures all honest nodes receive the same messages, binary agreement reaches consensus on message inclusion, and the commit phase constructs Merkle trees for cryptographic verification. This elegant dance of distributed agreement occurs in approximately one second, a remarkable achievement for Byzantine consensus at scale.

Phase 1: Reliable Broadcast

  • ECHO messages from N nodes
  • READY threshold: 2f+1 nodes
  • DELIVER decision on agreement
  • Prevents equivocation

Phase 2: Binary Agreement

  • Propose binary value (0/1)
  • Collect votes from nodes
  • Decide with 33% fault tolerance
  • Guaranteed termination

Phase 3: Commit

  • Merkle tree construction
  • Cryptographic proof generation
  • Bulletin board storage
  • Client notification dispatch

HONEYBADGER CONSENSUS FLOW

    Message Collection                    HoneyBadgerBFT Phases               Commit & Store
    ─────────────────                    ───────────────────────              ──────────────
    
    ┌──────────┐                         ┌────────────────┐                  ┌──────────────┐
    │Message 1 │──┐                      │    Reliable    │                  │Build Merkle  │
    ├──────────┤  │                      │    Broadcast   │                  │     Tree     │
    │Message 2 │──┼───Batch──────────────►├────────────────┤──────────────────►├──────────────┤
    ├──────────┤  │  Formation           │• ECHO Messages │                  │ Generate     │
    │Message N │──┘  (10 msgs)           │• READY (2f+1)  │                  │   Proofs     │
    └──────────┘                         │• DELIVER       │                  └──────┬───────┘
                                         └────────┬───────┘                         │
                                                  │                                 ▼
                                         ┌────────▼───────┐                  ┌──────────────┐
                                         │     Binary     │                  │   Store in   │
                                         │   Agreement    │                  │   Bulletin   │
                                         ├────────────────┤                  ├──────────────┤
                                         │• Propose Value │                  │   Notify     │
                                         │• Vote Collection│                  │   Clients    │
                                         │• Decide (67%)  │                  └──────────────┘
                                         └────────────────┘
    
    Consensus Time: ~1 second · Fault Tolerance: 33% · Batch Size: 10 messages
                

Secure Multi-Party Computation

Through the mathematical elegance of Shamir's secret sharing, the platform enables multiple parties to jointly compute functions over their private inputs without revealing those inputs to each other. This is not theoretical cryptography but practical privacy, enabling federated learning across untrusted nodes while maintaining complete confidentiality of individual contributions.

IMPLEMENTATION: LF3PFL_COORDINATOR.PY
async def _secret_sharing_aggregation(
    self,
    updates: List[GradientUpdate],
    method: AggregationMethod
) -> torch.Tensor:
    """Aggregate using secret sharing for privacy"""
    
    # Extract gradients from updates
    gradients = [u.gradient for u in updates]
    
    # Generate random masks that sum to zero
    # This ensures the aggregation is correct while hiding individual values
    masks = []
    for i in range(len(gradients) - 1):
        mask = torch.randn_like(gradients[0]) * 0.01
        masks.append(mask)
    
    # Last mask ensures sum is zero (additive secret sharing)
    if masks:
        masks.append(-sum(masks))
    
    # Apply masks to hide individual gradients
    masked_gradients = [g + m for g, m in zip(gradients, masks)]
    
    # Byzantine-robust aggregation methods
    if method == AggregationMethod.MEAN:
        aggregated = torch.mean(torch.stack(masked_gradients), dim=0)
    elif method == AggregationMethod.TRIMMED_MEAN:
        # Remove top and bottom 10% before averaging
        sorted_grads = torch.sort(torch.stack(masked_gradients), dim=0)[0]
        trim_size = len(masked_gradients) // 10
        aggregated = torch.mean(sorted_grads[trim_size:-trim_size], dim=0)
    elif method == AggregationMethod.KRUM:
        # Select gradient with minimum distance to others
        aggregated = self._krum_aggregation(masked_gradients)
    
    return aggregated

# Byzantine Detection: Statistical outlier detection
variance_threshold = 2.0 * expected_variance
byzantine_nodes = [i for i, g in enumerate(gradients)
                  if torch.var(g) > variance_threshold]
                

Anonymous Routing via Tor

    ╔════════════════════════════════════════════════════════════════╗
    ║               TOR CIRCUIT ESTABLISHMENT                        ║
    ╠════════════════════════════════════════════════════════════════╣
    ║                                                                ║
    ║  Privacy         Guard          Middle₁        Middle₂        ║
    ║  Controller       Node           Node           Node          ║
    ║     │              │               │              │           ║
    ║     ├──Create──────►               │              │           ║
    ║     │  Circuit     │               │              │           ║
    ║     │◄─────────────┤               │              │           ║
    ║     │  Key K₁      │               │              │           ║
    ║     │              │               │              │           ║
    ║     ├──Extend──────►───Extend──────►              │           ║
    ║     │  (Enc: K₁)   │               │              │           ║
    ║     │◄─────────────┼───────────────┤              │           ║
    ║     │  Key K₂      │               │              │           ║
    ║     │              │               │              │           ║
    ║     ├──Extend──────►───Forward─────►──Forward─────►           ║
    ║     │  (Enc: K₁,K₂)│               │              │           ║
    ║     │◄─────────────┼───────────────┼──────────────┤           ║
    ║     │  Key K₃      │               │              │           ║
    ║     │              │               │              │           ║
    ║     │          TRIPLE ENCRYPTION                   │           ║
    ║     ├══Data════════►═══Decrypt═════►══Decrypt═════►═Decrypt═►║
    ║     │  K₁+K₂+K₃    │      K₁       │      K₂      │    K₃    ║
    ║     │              │               │              │           ║
    ║                                                                ║
    ║  Circuit Build Time: ~300ms · Hops: 3 · Lifetime: 600s        ║
    ║                                                                ║
    ║  Hidden Services:                                              ║
    ║  • composer.onion        • bulletin-board.onion               ║
    ║  • policy-engine.onion   • privacy-controller.onion           ║
    ║  • registry.onion        • slo-broker.onion                   ║
    ╚════════════════════════════════════════════════════════════════╝
                

Performance & Compliance

< 1ms DP Noise Addition
~100ms ZK Proof Generation
< 10ms MPC Secret Sharing
~300ms Tor Circuit Build
~1s Byzantine Consensus
< 50ms RSA Generation

Regulatory Compliance

GDPR ✓
HIPAA ✓
SOC 2 ✓
ISO 27001 ✓
PCI DSS ✓
FIPS 140-2 ✓
"We have achieved what was once thought impossible: enterprise-grade privacy at the speed of modern computing."
Privacy Engineering Lead

Hardware Privacy Infrastructure

Hardware Layer

Hardware Attestation

Triple attestation stack proving code integrity without trusted third parties. NVIDIA CC-On, Intel TDX, and AMD SEV-SNP create hardware-backed trust anchors.

NVIDIA CC-On: SPDM certificate chains
Intel TDX: Encrypted VM memory
AMD SEV-SNP: Memory integrity
Evidence Staleness: 24-hour threshold
Root of Trust: Silicon-backed
Measurement: GPU registers + firmware
Data Layer

Encrypted I/O Pipeline

End-to-end encryption with per-job Data Encryption Keys. All artifacts encrypted with AES-GCM before leaving secure enclaves.

DEK Generation: Per-job unique keys
Encryption: AES-GCM authenticated
TLS: Mutual authentication
Decryption: Inside secure enclave only
Provider View: Ciphertext only
Plaintext Location: Protected memory
Computation Layer

Secure Aggregation

Committee-based federated learning with Shamir secret sharing over finite fields. Gradient privacy without reconstruction.

Field: GF(4,294,967,291)
Threshold: K-of-M Shamir splits
DP Noise: Gaussian mechanism
Fixed-Point: Scale by 10^6
Key Derivation: HKDF-SHA256
Bulletin Board: Redis/S3/IPFS
Network Layer

Traffic Analysis Resistance

Fixed-size message padding and decentralized bulletin boards prevent size-based analysis. ANOFEL routing obscures metadata.

Message Padding: Fixed 128KB
Bulletin: Decentralized append-only
ANOFEL: Distributed routing
Traffic Pattern: Uniform timing
Metadata: Zero leakage
Per-Round Keys: Ephemeral AES

The Model Bazaar

Dual-Token Compute Economy

The marketplace runs on a two-token architecture: ACU (Actual Compute Units) as the fixed-supply settlement currency, and AVL (Availability Token) as the inflationary utility token rewarding provider liveness. Users pay in ACU, providers earn AVL emissions, then convert to ACU via oracle-priced burns.

Smart contracts on Arbitrum handle trustless settlement. Each job deposits ACU into MirrorMintPool escrow, metering slices track consumption in micro-ACU precision, and settlement routes 80% to providers while burning 20% as protocol fees. Provider availability determines AVL emissions through Merkle airdrops—the longer you stay online, the more you earn.

// User deposits ACU for job execution
MirrorMintPool.depositForJob(job_id, microAcu);

// Metering tracks consumption
MeteringService.ingest_slice(job_id, scm_consumed);

// Settlement routes payment
SettlementRouter.settle_job(job_id, provider);
// → 80% provider payment (released_micro_acu)
// → 20% protocol burn (burned_micro_acu)
// → 10% held for disputes (held_micro_acu)

// Provider earns AVL via availability
AvailabilityMerkleMinter.claim(epoch, amount);

// Provider converts AVL → ACU
ConversionRouter.burnAVLForACU(acuAmount);
Fixed
ACU Supply (S_MAX)
Daily
AVL Emissions
1µACU
Settlement Precision

Developer Experience

Modular Adapter Architecture

The SDK exposes a registry-based adapter system allowing third-party extensions without platform redeployment. Training, inference, quantization, rendering, and federated adapters ship by default. Custom workload types register via register_adapter(), transforming job specs into resource profiles and execution plans.

Each adapter implements prepare(job_spec) and map_metrics(raw). The control plane uses ResourceProfile (num_gpus, min_vram_gb, interconnect, features) for provider matching, while ExecutionPlan (image, command, env, volumes) drives container orchestration. Adapters normalize telemetry—training emits step/loss/throughput, inference emits latency_p95_ms/QPS/error_rate.

Most remarkably, the entire distributed infrastructure—metering, settlement, hardware attestation, encrypted I/O—becomes invisible. Developers submit Python functions; the platform handles Docker builds, GPU allocation, privacy-preserving execution, and trustless payment routing.

from gpu_platform import Client, register_adapter # Custom adapter for fine-tuning workloads class FineTuneAdapter(Adapter): def prepare(self, job_spec): return ResourceProfile( num_gpus=job_spec.get("num_gpus", 1), min_vram_gb=40, features=("cuda>=12.1", "peft", "bitsandbytes") ), ExecutionPlan(...) # Register adapter (zero platform changes) register_adapter("finetune", FineTuneAdapter) # Submit fine-tuning job client = Client(api_key="your_key") job = client.submit( adapter="finetune", base_model="llama-70b", dataset="custom_data.jsonl", lora_rank=64 ) # Platform handles: hardware attestation, encrypted I/O, # metering, settlement, ACU payment routing print(f"Job: {job.id}, Cost: {job.cost_micro_acu / 1e6} ACU")

Privacy Service Integration Architecture

Kong Gateway Plugins and Service Orchestration

The privacy architecture's true power emerges from the seamless integration of its components through the Kong API Gateway. Three critical plugins—privacy-budget, anofel-provenance, and msi-degraded—form the first line of defense, validating every request against privacy policies before it enters the system.

KONG PLUGINS
The privacy-budget plugin (priority 1000) validates differential privacy budgets, the anofel-provenance plugin (priority 950) verifies zero-knowledge proofs, and the msi-degraded plugin enables graceful degradation under privacy violations.
┌─────────────────────────────────────────────────────────────────┐
│                  API Gateway Integration (Kong :8000)           │
├─────────────────────────────────────────────────────────────────┤
│  • privacy-budget plugin (priority: 1000)                       │
│    - Headers: X-Privacy-Epsilon, X-Privacy-Grant               │
│    - Min Budget: 0.1, Cache TTL: 120s                         │
│                                                                 │
│  • anofel-provenance plugin (priority: 950)                   │
│    - Headers: X-AnoFel-Proof, X-Tor-Signature                 │
│    - Proof Verification: Base64 → JSON → Validate              │
│                                                                 │
│  • msi-degraded plugin                                         │
│    - Graceful Degradation, Read-Only Methods                   │
└─────────────────┬───────────────────────────────────────────────┘
                  │
    ┌─────────────┼─────────────────────┐
    ▼             ▼                     ▼
Privacy      Privacy Stack          Privacy Suite
Controller   :8140                  :8141
:8009        Budget Mgmt           Components 27-32
    │             │                     │
    └─────────────┼─────────────────────┘
                  ▼
          Business Services
    (Composer, Policy, Registry, SLO)
                

Privacy Budget Ledger & Enforcement

Differential Privacy Resource Management

The privacy budget ledger implements a sophisticated accounting system for differential privacy resources. Each operation consumes a portion of the privacy budget (ε, δ), tracked with microsecond precision and enforced through distributed consensus.

Privacy Budget Enforcement Flow
───────────────────────────────

Request ──► Extract Headers ──► Validate Format ──► Redis Lookup
              │                                         │
              ▼                                         ▼
        Check Format                              Database Query
        ε ∈ [0.1, 5.0]                                 │
        δ ∈ [1e-9, 1e-3]                              ▼
                                              Calculate Remaining
                                                       │
                                              Budget Sufficient?
                                             ╱         │         ╲
                                         Allow     Throttle     Deny
                                           │                      │
                                     Consume Budget          Log Violation
                                           │
                                     Update Cache & DB
                                           │
                                      Audit Trail

Configuration Parameters:
• Max Epsilon: 5.0              • Refresh Interval: 3600s
• Max Delta: 1e-5               • Max Tokens: 5
• Min Budget: 0.1               • Cache TTL: 120s
                

Security Vault Database Schema

Key Management and Cryptographic Operations

The Security Vault maintains eight critical tables for managing cryptographic materials and audit trails. Built on BadgerDB for performance with PostgreSQL for compliance tracking, it supports RSA, ECDSA, ED25519, AES, and ChaCha20Poly1305 operations with hardware security module integration.

SCHEMA.SQL
-- Key rotation with audit trail
CREATE OR REPLACE FUNCTION rotate_key(
    p_old_key_id UUID,
    p_new_key_id UUID,
    p_rotated_by VARCHAR
) RETURNS UUID AS $$
DECLARE
    v_rotation_id UUID;
BEGIN
    -- Create rotation record
    INSERT INTO key_rotation_history (
        old_key_id, new_key_id, rotated_by
    ) VALUES (
        p_old_key_id, p_new_key_id, p_rotated_by
    ) RETURNING id INTO v_rotation_id;
    
    -- Deactivate old key
    UPDATE keys 
    SET active = FALSE 
    WHERE id = p_old_key_id;
    
    -- Update rotation timestamp
    UPDATE keys 
    SET rotated_at = NOW() 
    WHERE id = p_new_key_id;
    
    RETURN v_rotation_id;
END;
$$ LANGUAGE plpgsql;

-- Tables: keys, secrets, certificates, key_rotation_history,
-- hsm_keys, encryption_operations, audit_log, compliance_records
                

System Architecture

A Technical Anatomy of Decentralized Compute

The distributed GPU network, while revolutionary in its ambitions, rests upon a foundation of carefully orchestrated components. This technical appendix documents the actual implementation—a system battle-tested across thousands of reservations, millions of compute minutes, and hundreds of provider nodes.

At its core, the architecture separates concerns across three primary layers: the control plane for orchestration and economic coordination, the provider network for compute execution, and the settlement layer anchored in Arbitrum smart contracts for trustless financial settlement.

CONTROL PLANE
The control plane operates as a stateful HTTP service exposing authenticated APIs for reservation, metering, settlement, and governance. Built atop SQLite with write-ahead logging, it maintains seven critical tables: supply offers, demand configurations, metering slices, job receipts, provider attestations, job allocations, and async task metadata.

The PriceIndexOracleService implements uniform-price bucket auctions with surge multipliers and per-entity capacity caps. Demand configuration accepts micro-SCM requirements plus reserve buffers; supply submission accumulates offers sorted by price. Bucket finalization executes a modified uniform-price clearing: offers are sorted ascending, demand is filled sequentially, and the marginal price becomes the clearing price for all accepted units.

Surge multipliers apply when utilization exceeds 95%, scaling linearly from 1.0× at 95% to 1.5× at 100%. Entity caps prevent single providers from capturing more than 30% of any bucket's supply—a critical anti-manipulation guard. The resulting BucketResult contains clearing price, utilization basis points, surge multiplier, and filled micro-SCM, all persisted locally before being mirrored to the on-chain PriceIndexOracle contract.

"In distributed systems, the most elegant solution is often the one that admits its own limitations and builds safeguards around them."
Control Plane Architecture Notes

The VRACUScheduler implements Dominant Resource Fairness with attained-service scoring. Each provider maintains a running total of attained service minutes; new allocations favor providers with lower historical utilization. The fairness score combines expected job duration with attained service ratio, preventing long-running providers from monopolizing capacity while ensuring hardware constraints (VRAM, interconnect class) are satisfied.

NVIDIA MIG (Multi-Instance GPU) support extends the scheduler's capacity model. Provider registration records MIG profile, partition count, and per-partition memory. The scheduler normalizes SCM rates on a per-partition basis, treating each MIG slice as an independent scheduling unit. This enables fine-grained multi-tenancy without sacrificing fairness or resource isolation.

Metering & Settlement

From Computation to Currency

The MeteringService provides strict idempotency guarantees via SHA-256 content hashing. Each meter slice contains job ID, bucket ID, sequence number, SCM delta, and the price index at time of execution. Duplicate submissions (identical hash) succeed silently; conflicting payloads (same sequence, different hash) return rejection with reason code.

Settlement aggregation computes SCM-weighted time-averaged pricing (TWAP) across all slices for a job. The formula: Σ(minutes × price) / Σ(minutes). Burn amounts apply ceiling rounding to micro-ACU units, ensuring providers never receive fractional tokens. Hold fractions (0.0–1.0) split burn amounts between immediate provider payout and refund escrow, enabling dispute resolution without blocking settlement.

Settlement Flow
# Aggregate metering slices total_minutes = sum(slice.minutes for slice in slices) numerator = sum(slice.minutes * slice.price for slice in slices) # Compute burn with ceiling rounding burn_micro_acu = ceil(numerator / mint_price_micro_usd) # Apply hold fraction for dispute buffer provider_micro_acu = int(burn_micro_acu * (1.0 - hold_fraction)) refund_micro_acu = burn_micro_acu - provider_micro_acu # Generate canonical receipt (JCS sorted keys) receipt = { "burn_micro_acu": burn_micro_acu, "job_id": job_id, "mint_price_micro_usd_per_acu": mint_price, "provider": provider_address, "provider_micro_acu": provider_micro_acu, "refund_micro_acu": refund_micro_acu, "twap_micro_usd_per_scm": numerator // total_minutes }

The DualSignatureService produces cryptographic attestations for every settlement receipt. Primary signatures use Ed25519 with embedded public keys (32-byte seed expanded via SHA-512). Secondary signatures employ a post-quantum envelope: 64-byte secrets processed through SHAKE-256 XOF, yielding Dilithium3-compatible signing material.

Both signatures cover the JCS-canonicalized receipt JSON (sorted keys, minimal whitespace). The control plane persists signatures alongside receipts in SQLite, enabling offline verification without blockchain round-trips. The GET /settlement/receipt/{job_id} endpoint returns the canonical receipt, both signature envelopes, and current Mirror-Mint escrow state.

ESCROW MECHANICS
The MirrorMintPool contract implements job-scoped escrow accounts. Users deposit micro-ACU before execution; settlement burns protocol fees and routes provider payments; remaining balances return to users post-finalization. Hold mechanisms freeze disputed amounts pending governance resolution.

Workload Adapters

Modular, Registry-Based Job Transformation

Modularity, not monoliths. Unlike traditional job schedulers that hardcode workload types into platform logic, the distributed compute network employs adapters— protocol-based transformers registered at runtime via a plugin architecture. Each adapter converts high-level job specifications into resource profiles and execution plans, enabling the same user code to run across Docker, Ray clusters, or Kubernetes without modification. Third-party developers extend the platform by registering custom adapters without touching core infrastructure code.

Registry-Based Architecture

The AdapterFactory pattern decouples adapter implementations from scheduler logic. A global registry maps adapter names to factory functions, allowing hot-swapping of adapters without redeploying the launcher service. The registry initializes with five core adapters (training, inference, quantization, rendering, federated), but any module can call register_adapter(name, factory) to inject custom transformation logic.

Adapter Registry & Plugin System
# Core registry with built-in adapters _ADAPTER_REGISTRY: Dict[str, AdapterFactory] = { "training": TrainingAdapter, "inference": InferenceAdapter, "render": RenderingAdapter, "quant": QuantizationAdapter, } # Third-party registration (zero core changes) def register_adapter(name: str, factory: AdapterFactory) -> None: _ADAPTER_REGISTRY[name.lower()] = factory # Example: Custom fine-tuning adapter class FineTuneAdapter(Adapter): def prepare(self, job_spec): profile = ResourceProfile( num_gpus=job_spec.get("num_gpus", 1), min_vram_gb=40, # LoRA/QLoRA requirements features=("cuda>=12.1", "peft", "bitsandbytes") ) # Custom logic for parameter-efficient tuning... return profile, plan # Register without platform redeployment register_adapter("finetune", FineTuneAdapter)

The adapter protocol defines two primary operations: prepare(job_spec) transforms declarative requirements into a (ResourceProfile, ExecutionPlan) tuple, while map_metrics(raw) normalizes heterogeneous telemetry into standardized metering signals. This abstraction allows the control plane to allocate providers based on resource constraints without understanding framework-specific details.

RESOURCE PROFILES
ResourceProfile captures hardware requirements: num_gpus, min_vram_gb, interconnect (NVLink/PCIe/InfiniBand), scm_minutes (expected duration), and features (CUDA version, NCCL, ROCm). The scheduler uses these constraints to filter provider inventory before applying fairness scoring.

Training Adapter

The TrainingAdapter prepares distributed training jobs with multi-GPU coordination strategies. It accepts job specs containing image references, command arrays, VRAM requirements, and interconnect preferences. The adapter injects environment variables for DDP (DistributedDataParallel) or FSDP (Fully Sharded Data Parallel) rendezvous, configures volume mounts for dataset access, and sets priority metadata for queue ordering.

Training Adapter Usage
job_spec = { "image": "ghcr.io/org/training:v2.1", "command": ["python", "-m", "torch.distributed.run", "train.py"], "num_gpus": 8, "min_vram_gb": 80, "interconnect": ["nvlink"], "scm_minutes": 720, "features": ["cuda>=12.1", "nccl"], "strategy": "ddp" } adapter = TrainingAdapter() profile, plan = adapter.prepare(job_spec) # Metrics normalization raw_metrics = {"step": 1024, "loss": 0.42, "throughput": 2048} normalized = adapter.map_metrics(raw_metrics) # → {"step": 1024, "loss": 0.42, "throughput": 2048}

Inference Adapter

The InferenceAdapter targets long-running model serving deployments. Unlike batch training jobs, inference workloads require service-oriented execution: health probes (readiness/liveness), autoscaling configurations, load balancer exposure, and rolling update strategies. The adapter generates Kubernetes-compatible execution plans with service ports, replica counts, and horizontal pod autoscaling parameters.

Health probes default to HTTP GET requests against /health endpoints, with configurable initial delays and check intervals. Service types (ClusterIP, LoadBalancer, NodePort) control network exposure. Autoscaling policies define CPU/memory thresholds triggering replica scale-up, enabling elastic capacity matching demand spikes.

"The adapter abstraction transforms infrastructure heterogeneity from a liability into an asset—the same workload definition executes across bare metal, cloud VMs, and edge devices."
Adapter Design Philosophy

Federated Learning Adapter

The FederatedAdapter prepares multi-party training with privacy-preserving aggregation. Job specs include: world_size (participant count), committee parameters (K-of-M threshold for Shamir secret sharing), differential privacy budgets (epsilon/delta), and bulletin board backend configuration (Redis, S3, or IPFS).

The adapter injects environment variables controlling aggregation behavior: FED_PACKET_BYTES (fixed-size message padding, default 128KB), FED_ROUNDS (training iterations), FED_DP_EPSILON (privacy budget), FED_BB_BACKEND (bulletin board type), and FED_ROUND_SECRET (shared secret hex for key derivation). These parameters enable secure gradient aggregation without trusted coordinators.

Federated Adapter Configuration
fed_spec = { "image": "ghcr.io/vracu/federated:latest", "command": ["python", "federated_train.py"], "world_size": 8, "committee_k": 3, # Threshold "committee_m": 5, # Total shares "dp_epsilon": 3.0, "dp_delta": 1e-5, "bb_backend": "redis", "bb_uri": "redis://localhost:6379/0", "rounds": 50, "packet_bytes": 131072 # 128KB padding } adapter = FederatedAdapter() profile, plan = adapter.prepare(fed_spec) # Execution plan includes privacy env vars assert plan.env["FED_DP_EPSILON"] == "3.0"

Quantization and rendering adapters follow similar patterns, specializing resource requirements and metric extraction for their respective workloads. The quantization adapter handles model compression tasks (GPTQ, AWQ, bitsandbytes), while the rendering adapter manages visual workloads with GPU rasterization demands.

Privacy-Preserving Computation

Hardware Attestation, Encrypted Pipelines, and Secure Aggregation

The network's privacy guarantees emerge from three complementary layers, each addressing a distinct threat model. Hardware attestation prevents operators from inspecting workload data. Encrypted input/output pipelines protect data in transit and at rest. Cryptographic secure aggregation prevents peers from observing individual contributions during federated training. Together, these mechanisms enable confidential computation on untrusted infrastructure.

Hardware Attestation

Provider nodes collect attestation evidence proving workloads execute inside hardware-protected enclaves. Three attestation technologies integrate via composite providers: NVIDIA Confidential Computing (CC-On), Intel TDX (Trust Domain Extensions), and AMD SEV-SNP (Secure Encrypted Virtualization with Secure Nested Paging).

NVIDIA CC-ON
CC-On attestation combines SPDM (Security Protocol and Data Model) certificate chains with GPU reports. SPDM chains prove firmware integrity from silicon root of trust to current runtime. GPU reports contain measurement registers (MRs) covering loaded code, preventing operator code injection. Evidence loads from `/sys/kernel/debug/nvidia-cc-on/` debugfs interfaces.

Intel TDX provides VM-level isolation with encrypted memory and integrity protection. Attestation quotes prove execution inside a Trust Domain, with measurements covering firmware, kernel, and initial ramdisk. Control plane verification checks quote signatures against Intel's root keys, ensuring authenticity.

AMD SEV-SNP extends SEV with stronger memory integrity guarantees. SNP reports include platform measurements and VM guest policy, preventing malicious hypervisor tampering. Combined with encrypted memory, SNP isolates guest execution from host observation.

The CompositeAttestationProvider aggregates evidence from multiple sources, enabling hybrid deployments (e.g., NVIDIA GPU inside Intel TDX VM). Async evidence collection occurs during provider attach, with challenge-response protocols ensuring freshness. Stale evidence (> 24 hours) triggers re-attestation before accepting privacy-tier workloads.

Attestation Evidence Collection
# Filesystem-based attestation providers nvidia_provider = NvidiaCcOnFilesystemProvider( spdm_chain_paths=[Path("/sys/kernel/debug/nvidia-cc-on/spdm")], gpu_report_paths=[Path("/sys/kernel/debug/nvidia-cc-on/gpu_report")] ) tdx_provider = TdxFilesystemProvider( quote_paths=[Path("/sys/kernel/config/tsm/report/tdx_quote")] ) # Composite aggregation composite = CompositeAttestationProvider([nvidia_provider, tdx_provider]) # Challenge-response for freshness challenge = secrets.token_hex(32) evidence = await composite.produce(challenge) # Evidence structure: {provider_name: {type: data}} assert "nvidia_cc_on" in evidence assert "tdx" in evidence

Encrypted Input/Output Pipeline

Job artifacts (datasets, model checkpoints, configuration files) never touch provider disks in plaintext. The launcher service generates per-job Data Encryption Keys (DEKs), encrypting all artifacts with AES-GCM authenticated encryption. DEKs transfer to provider sidecars via TLS mutual authentication, with client certificate fingerprint validation preventing unauthorized access.

The sidecar's decrypt shim fetches DEKs at job start, decrypts artifacts inside the secure enclave, and launches the workload. Operators observe only ciphertext blobs—plaintext exists exclusively within hardware-protected memory. Output encryption reverses the flow: results encrypt before leaving the enclave, with DEKs accessible only to job initiators.

"Privacy is not a feature to be bolted on—it's an architectural invariant enforced at every system boundary."
Privacy Architecture Principles
Encrypted Artifact Decryption
# Sidecar fetches DEK using TLS client cert dek = fetch_dek( launcher_url="https://launcher.vracu.net", job_id="job-42", fingerprint_sha256=cert_fingerprint, cert_bundle={ "cert": "/tls/client.pem", "key": "/tls/client.key", "ca": "/tls/ca.pem" } ) # Decrypt artifacts inside enclave for artifact in encrypted_artifacts: nonce = base64.b64decode(artifact["nonce"]) ciphertext = base64.b64decode(artifact["ciphertext"]) aad = base64.b64decode(artifact.get("aad", "")) # AES-GCM decrypt with authentication plaintext = AESGCM(dek).decrypt(nonce, ciphertext, aad) # Write to secure enclave filesystem path = secure_workdir / artifact["path"] path.write_bytes(plaintext) # Plaintext never leaves enclave

Federated Learning with Secure Aggregation

The network implements committee-based secure aggregation combining Shamir secret sharing, differential privacy, and bulletin board coordination. Unlike naive averaging (where aggregators observe raw gradients), this protocol ensures no party—including the coordinator—sees individual contributions.

Each training round proceeds as follows: participants add Gaussian noise to gradients (satisfying (ε,δ)-differential privacy), encode noisy gradients as field elements over GF(4,294,967,291), split into M shares via Shamir's scheme (K-of-M threshold), encrypt shares with per-pair AES keys, post ciphertexts to bulletin board, collect K shares addressed to them, decrypt and reconstruct via Lagrange interpolation, average reconstructed gradients, and broadcast the aggregate.

FINITE FIELD ARITHMETIC
Shamir sharing operates over GF(p) where p=4,294,967,291 (largest 32-bit prime). Gradients scale by 10^6 before field encoding, preventing floating-point leakage. Reconstruction uses modular arithmetic with Lagrange basis polynomials evaluated at zero. Decoded values restore float32 via division and sign correction (values > p/2 become negative).

Committee selection employs deterministic random sampling seeded by round ID, ensuring all participants agree on the committee without coordination. Key derivation uses HKDF-SHA256 with context strings encoding round, sender, and recipient identities: fed-round:{round}:from:{sender}:to:{recipient}. This generates unique AES keys for each communication pair per round.

The bulletin board abstraction supports three backends: Redis (RPUSH/LRANGE for ordered streams), S3 (timestamped objects with lexicographic ordering), and IPFS (content-addressed immutable logs). Fixed-size message padding (default 128KB) prevents size-based traffic analysis.

Secure Aggregation Round
# Add DP noise (Gaussian mechanism) sigma = math.sqrt(2.0 * math.log(1.25 / delta)) * sensitivity / epsilon noisy_grad = gradient + np.random.normal(0.0, sigma, gradient.shape) # Encode as field elements (scale by 10^6) scaled = np.rint(noisy_grad * 1e6).astype(np.int64) field_vals = np.mod(scaled, FIELD_MODULUS).astype(np.uint64) # Shamir split (K=3, M=5) shares = shamir_split(field_vals, k=3, m=5, seed=round_id) # Encrypt shares for committee members for share, member in zip(shares, committee): aes_key = _derive_key(shared_secret, round_id, rank, member) cipher = AESGCM(aes_key) nonce = os.urandom(12) ciphertext = cipher.encrypt(nonce, share.tobytes(), None) # Post to bulletin board with padding bulletin.post( topic=f"fed/{round_id}/shares", payload=pad_to_fixed(json.dumps(payload), 131072) ) # Collect K shares, reconstruct via Lagrange aggregates = _collect_shares(bulletin, rank, round_id, secret) combined = shamir_combine(aggregates[:k]) decoded = _decode_gradient(combined) / world_size

This multi-layered approach achieves end-to-end confidentiality: hardware attestation proves code integrity, encrypted I/O protects data in motion, and secure aggregation prevents gradient leakage. The combination enables privacy-preserving machine learning on commodity hardware without trusted third parties—a capability previously requiring specialized secure enclaves or multiparty computation protocols.

Dual-Token Economic Model

ACU Settlement Currency & AVL Availability Incentives

Two tokens, distinct roles, unified economy. The network employs a dual-token architecture where ACU (Actual Compute Units) serves as the fixed-supply settlement currency, while AVL (Availability Token) functions as the inflationary utility token rewarding provider liveness. This separation creates economic pressure: ACU scarcity drives value appreciation, AVL emissions incentivize capacity contribution, and the ConversionRouter bridges them via oracle-priced burns—transforming availability into settlement rights.

ACU: The Settlement Token

ACU implements a fixed-supply ERC20 (18 decimals) with no mint function post-deployment. The total supply (S_MAX) initializes at construction and remains immutable—every ACU that will ever exist mints to the treasury address during contract deployment. This design choice transforms ACU into a deflationary settlement currency: as compute demand grows, fixed supply creates scarcity pressure.

IMMUTABLE SUPPLY
Unlike typical utility tokens with governor-controlled mint functions, ACU's supply invariant is enforced at the Solidity contract level. The ACUToken.sol contract inherits from Governable but explicitly omits mint/burn methods—supply changes are mathematically impossible without redeploying to a new address.

Users deposit ACU into the MirrorMintPool escrow contract for job execution. Each job receives an isolated escrow account tracking: deposited_micro_acu, burned_micro_acu, released_micro_acu (provider payments), refunded_micro_acu, and held_micro_acu (dispute buffer). Settlement burns protocol fees to the treasury while routing provider payouts and refunding unused balances—all operations occur in micro-ACU (1e-6 ACU) precision to minimize rounding losses.

AVL: The Availability Token

AVL implements an ERC20 with role-gated minting and burning. The contract enforces a MAX_SUPPLY cap but allows addresses holding the MINTER_ROLE to create new tokens below this ceiling. Daily emissions distribute AVL to providers via Merkle airdrops proportional to their availability scores— the longer a provider maintains liveness (passing heartbeat checks), the more AVL they earn.

Providers stake AVL in the AvailabilityStaking contract to signal commitment. Staked amounts act as economic bonds: misbehavior (failed jobs, missed heartbeats) triggers slashing via the SLASHER_ROLE, burning a percentage of the stake. Unstaking requires a cooldown period preventing providers from exiting immediately before slashing events. The staking mechanism creates skin-in-the-game: providers risk capital to participate, and penalties enforce service quality.

Token Flow Architecture
# 1. User deposits ACU for job execution MirrorMintPool.depositForJob(job_id="job-42", microAcu=10_000_000) # Escrow: {deposited: 10M micro-ACU, burned: 0, released: 0} # 2. Job executes, metering slices recorded MeteringService.ingest_slice( job_id="job-42", minutes_delta_scm_micro=100_000, # 100 SCM consumed priceindex_micro_usd_per_scm=120_000 # $0.12/SCM ) # 3. Settlement aggregates slices, computes TWAP result = SettlementRouter.settle_job( job_id="job-42", provider="gpu-a100-01", hold_fraction=0.1 # 10% held for disputes ) # burn_micro_acu: 2,000,000 (20% protocol fee) # provider_micro_acu: 7,200,000 (90% of 8M) # refund_micro_acu: 800,000 (unused escrow) # 4. Provider earns daily AVL emissions AvailabilityMerkleMinter.claim( epochId="2025-11-04", to="gpu-a100-01", amount=1000 * 1e18, # 1000 AVL proof=merkle_proof ) # 5. Provider burns AVL to mint ACU (via oracle price) ConversionRouter.burnAVLForACU( acuAmount=500 * 1e18, # Mint 500 ACU recipient="gpu-a100-01" ) # Oracle: 1 ACU = 2.5 AVL at current TWAP # Burns: 1250 AVL, Mints: 500 ACU

ConversionRouter: The Bridge

The ConversionRouter contract implements one-way conversion: burn AVL → mint ACU. The oracle-determined exchange rate reflects market-discovered pricing: as compute demand increases relative to provider supply, the ACU price (denominated in AVL) rises. The router enforces ACU_MAX_SUPPLY—cumulative mints cannot exceed this ceiling—preventing infinite inflation even as AVL emissions continue.

The conversion mechanism creates economic alignment: providers earn AVL through availability (passive income), accumulate stakes, then convert to ACU when settlement demand materializes (active income). Users purchasing ACU on secondary markets indirectly reward past provider contributions. The dual-loop structure— AVL emissions incentivize long-term capacity, ACU scarcity rewards immediate execution—balances supply-side growth with demand-side sustainability.

"Fixed ACU supply transforms compute into a scarce digital commodity. Inflationary AVL emissions bootstrap network effects. Together, they create economic gravity pulling providers toward long-term participation."
Dual-Token Design Rationale

On-Chain Primitives

Trustless Settlement via Arbitrum

The settlement layer anchors economic finality in Arbitrum Nitro—a Layer 2 optimized for EVM execution with sub-second confirmation times and negligible gas costs. Seven Solidity contracts form the on-chain substrate: ACUToken, MirrorMintPool, PriceIndexOracle, BurnGovernor, ProtocolFeePool, AvailabilityToken, and ConversionRouter.

ACUToken implements a fixed-supply ERC-20 representing Standard Compute Minutes. Total supply is immutable post-deployment; no mint/burn functions exist, preserving the supply invariant. The treasury holds initial allocation; governance controls rescue functions for accidentally transferred tokens.

MirrorMintPool manages job escrow with burn/release/hold state machines. The depositForJob function accepts micro-ACU deposits, recording deposited amounts per job ID. Settlement authorities (authorized by governance) invoke settleJob with burn amounts, provider addresses, and receipt hashes. Burn amounts route to treasury; provider payments execute immediately; hold fractions freeze pending governance review.

Escrow Settlement
// Solidity settlement entrypoint function settleJob( bytes32 jobId, uint256 burnMicroAcu, address provider, uint256 providerMicroAcu, bytes32 receiptHash ) external onlyAuthority nonReentrant { Escrow storage esc = _escrows[jobId]; require(!esc.finalized, "already finalized"); // Burn protocol fees to treasury esc.burnedMicroAcu += burnMicroAcu; _pushTokens(treasury, burnMicroAcu); // Pay provider immediately esc.releasedMicroAcu += providerMicroAcu; _pushTokens(provider, providerMicroAcu); esc.receiptHash = receiptHash; emit Burned(jobId, burnMicroAcu); emit ProviderPaid(jobId, provider, providerMicroAcu); }

PriceIndexOracle records bucket configurations and clearing results. Demand oracles (governance-authorized) call configureBucketDemand with micro-SCM requirements and commit deadlines. Supply submitters post offers before finalization. The control plane executes clearing off-chain, then publishes results via finalizeBucket, emitting clearing price and surge multiplier events.

BurnGovernor mediates between settlement receipts and Mirror-Mint escrow. The settleJob function accepts job IDs and receipt payloads, extracting burn/provider amounts and forwarding to MirrorMintPool. Governance can pause settlements system-wide via emergencyPause, halting all burns without touching escrow state.

ProtocolFeePool accumulates burned ACU and distributes protocol revenues. Governance proposals withdraw to specified addresses; spending requires timelock execution (48-hour delay). The pool maintains immutable audit trails via Withdrawal and FeeAccrued events.

AVAILABILITY REWARDS
The AvailabilityToken (AVL) incentivizes idle capacity. Providers earn AVL proportional to unutilized SCM during commitment windows. Merkle distributions enable gas-efficient claims; κ dampening prevents supply manipulation.

ConversionRouter implements trustless ACU↔AVL swaps using the price oracle as reference. Conversion applies basis-point slippage caps; governance adjusts spreads based on liquidity depth. The router maintains no internal state—all pricing derives from on-chain oracle snapshots.

Provider Infrastructure

Hardware Abstraction and Workload Execution

The provider network transforms heterogeneous GPU hardware into fungible compute units through a three-layer abstraction: backend runners (Docker, Ray, Kubernetes), capability publishing, and heartbeat coordination.

Provider nodes begin life via alien attach—a CLI tool consuming join tokens from the directory API. The attach flow redeems tokens for provider IDs, runs microbenchmarks to calibrate SCM rates, collects attestation evidence, and publishes capabilities to the control plane.

Provider Onboarding
# Redeem join token for provider ID provider_id, metadata = redeem_join_token(settings) # Initialize backend (Ray/K8s/Docker) if settings.backend.kind == "ray": address = ray_ensure_head(client_port=settings.backend.ray_port) backend_payload = {"address": address} # Run microbenchmarks for SCM rate calibration scm_rate = derive_rate_micro() # Collect attestation evidence (SGX/TPM) if settings.privacy.enable_attestation: collect_evidence(settings) # Publish capabilities to control plane publish_capabilities( settings, provider_id, backend_payload=backend_payload, scm_rate_micro=scm_rate ) # Begin heartbeat loop run_heartbeat(settings, provider_id, stop_event)

Backend runners isolate workloads via container runtimes. The Docker backend executes jobs as privileged containers with GPU passthrough, SSH-tunneling logs and metrics to the control plane. The Ray backend bootstraps Ray clusters, submitting jobs via the Ray Jobs API with custom resource specifications (GPU count, VRAM, placement strategies). The Kubernetes backend provisions k3s clusters with NVIDIA device plugins, deploying jobs as pods with GPU requests/limits.

The heartbeat agent maintains provider liveness via periodic POST /heartbeat calls. Each heartbeat includes: provider ID, current load (running jobs, available VRAM), calibration drift (actual vs. advertised SCM rate), and attestation refresh timestamps. The directory API marks providers unavailable after three missed heartbeats (90-second timeout).

"Hardware diversity is strength. The abstraction layer turns chaos into composability, heterogeneity into opportunity."
Provider Network Design Principles

Privacy-preserving execution leverages attestation and encrypted inputs. Providers with SGX or TPM capabilities generate remote attestation quotes during attach; the control plane verifies quotes against manufacturer root keys before approving privacy-tier workloads. Encrypted job inputs decrypt inside secure enclaves, ensuring operators never observe plaintext data or intermediate activations.

The relay mechanism enables NAT traversal for home providers. Providers behind firewalls establish persistent WebSocket connections to relay servers; the control plane routes job submissions via relay endpoints. Bi-directional tunneling supports both job dispatch and real-time log streaming without requiring public IPs or port forwarding.

MIG PARTITIONING
NVIDIA A100/H100 MIG (Multi-Instance GPU) support enables fine-grained multi-tenancy. A single A100 partitions into seven 1g.10gb instances or two 3g.40gb instances. The attach agent configures profiles before benchmarking; the scheduler treats each partition as an independent unit.

Operational Resilience

Circuit Breakers, Chaos Engineering, and Multi-Region Coordination

Production infrastructure demands resilience mechanisms beyond optimistic execution paths. The system implements circuit breakers, capacity guards, regional isolation, and chaos injection to maintain SLAs under adversarial conditions.

The primary price breaker monitors clearing prices against configured mint prices. When a bucket clears below mint_price - epsilon, the breaker opens, halting new demand configuration until price recovery. This prevents cascading under-pricing that could destabilize provider economics.

The capacity breaker tracks filled SCM across recent buckets (configurable window, default 10 buckets). If filled capacity falls below (demand + reserve) × (1 - buffer_pct / 100), the breaker trips, signaling insufficient supply to meet committed demand. The scheduler rejects new reservations until capacity recovers.

Breaker Logic
# Primary price breaker check threshold = mint_price_micro_usd - epsilon_micro_usd if clearing_price < threshold: breaker.trip( name="primary_price", reason=f"Clearing {clearing_price} below floor {threshold}" ) return ConflictError("Breaker open: price floor violated") # Capacity breaker check recent_buckets = db.query_buckets(limit=breaker.window_size) avg_filled = mean([b.filled_micro_scm for b in recent_buckets]) required = (demand + reserve) * (1.0 - buffer_pct / 100.0) if avg_filled < required: breaker.trip( name="capacity", reason=f"Avg filled {avg_filled} below required {required}" )

Regional isolation supports active-passive multi-region deployments. Control plane instances operate in three modes: NORMAL (full read-write), ISOLATED (local writes queued, remote reads blocked), and MERGE_REPLAY (reconciling queued writes post-outage).

During regional failures, operators invoke POST /region/isolate, transitioning the affected region to isolated mode. Local writes persist to SQLite; remote API calls receive 503 Service Unavailable. Recovery initiates via POST /region/merge, which replays queued writes against primary state, resolving conflicts via last-write-wins timestamp comparison.

CHAOS ENGINEERING
Controlled fault injection validates resilience claims. Chaos scenarios include: provider disappearance mid-job, bucket clearing with zero supply, settlement signature failures, escrow insufficiency, and regional partition. Automated suites run nightly against staging environments.

The resilience controller monitors job health and triggers automatic reallocation. Jobs exceeding SLA thresholds (95th percentile latency, failure rate > 5%) receive priority reallocation to higher-tier providers. The controller maintains a reallocate queue; operators approve/reject proposals via POST /resilience/approve/{job_id}.

Observability surfaces breaker state, queue depths, and regional health via Prometheus metrics and Grafana dashboards. Critical alerts fire on: breaker open > 5 minutes, queue depth > 1000 tasks, missed heartbeats > 10% of fleet, settlement failures > 1% of volume.

Payment Infrastructure

Stripe Integration, Ledger Accounting, and Fiat On-Ramps

While on-chain settlement handles provider payouts and protocol fees, enterprise users require fiat on-ramps. The payment stack bridges traditional finance via Stripe webhooks, double-entry ledger accounting, and ACU wallet provisioning.

The Stripe service listens for checkout.session.completed webhooks, validating HMAC signatures before processing. Successful checkouts trigger: credit issuance to user wallets, ledger debit/credit pairs (fiat → ACU), and escrow deposits for immediate job execution.

Stripe Webhook Handler
# Validate webhook signature event = stripe.Webhook.construct_event( payload, sig_header, endpoint_secret ) if event['type'] == 'checkout.session.completed': session = event['data']['object'] # Extract metadata user_id = session.metadata['user_id'] usd_cents = session.amount_total acu_micro = int(usd_cents * ACU_PER_CENT_MICRO) # Issue credits to ACU wallet wallet.deposit(user_id, acu_micro) # Record double-entry ledger transaction ledger.record_transaction( debit_account="fiat:stripe", credit_account=f"acu_wallet:{user_id}", amount_micro_acu=acu_micro, metadata={"stripe_session": session.id} )

The ledger service implements immutable double-entry accounting. Every transaction creates two ledger entries: one debit, one credit, summing to zero. Account types include: fiat:stripe (external fiat inflows), acu_wallet:* (user balances), escrow:* (job deposits), protocol:treasury (burned fees), and provider:* (payout accounts).

Monthly reconciliation queries aggregate ledger entries, verifying: Σ(debits) = Σ(credits), user wallet balances match sum of deposits minus escrow, and protocol treasury equals cumulative burns. Discrepancies trigger alerts and halt settlements pending manual review.

ACU WALLET
User wallets abstract blockchain complexity. The ACUWallet service manages Arbitrum keypairs, signs escrow deposits, and monitors on-chain balances. Users interact via simple deposit/withdraw APIs; wallet service handles gas estimation, nonce management, and transaction retry.

Escrow orchestration coordinates off-chain credits with on-chain deposits. When users initiate jobs, the payment processor: debits ACU wallets, credits escrow accounts, invokes MirrorMintPool.depositForJob, and persists transaction hashes for audit trails.

Provider payouts reverse the flow: settlement receipts trigger wallet credits, escrow debits, and optional fiat conversions. Providers configure payout rails (on-chain ACU, Stripe transfers, wire) via PATCH /providers/{id}/payout. The payout service batches settlements daily, minimizing gas costs via Merkle batching.

"Money is a ledger. The question is whether that ledger is controlled by centralized gatekeepers or mathematical consensus."
Payment Architecture Philosophy

Developer SDK

Abstractions for Seamless Integration

The Phase 4 SDK encapsulates reservation loops, metering, settlement, and receipt verification behind Pythonic interfaces. Machine learning engineers integrate distributed compute with minimal infrastructure knowledge.

The ControlPlaneClient provides authenticated HTTP transport. Retry logic handles transient failures (exponential backoff, jittered delays); rate limit detection (HTTP 429) triggers automatic back-pressure. SSL context support enables custom certificate validation for private deployments.

SDK Reservation Loop
from phase4_sdk import ControlPlaneClient, ReservationLoop # Initialize client with API credentials client = ControlPlaneClient( base_url="https://control.vracu.network", api_key=os.getenv("VRACU_API_KEY") ) # Configure reservation parameters loop = ReservationLoop( client=client, required_scm_minutes=1000, min_vram_gb=40, preferred_interconnect=["nvlink", "pcie"] ) # Execute reservation → allocation → metering → settlement result = loop.execute( job_id="train-gpt-neo-2.7b", workload_fn=lambda provider: train_model(provider) ) # Receipt includes cryptographic proof print(f"Settlement receipt: {result.receipt}") print(f"Ed25519 signature: {result.signature_primary}") print(f"PQ envelope: {result.signature_secondary}")

The ReservationLoop orchestrates multi-phase workflows. Phase 1 submits supply offers to the oracle. Phase 2 waits for bucket finalization. Phase 3 invokes the scheduler, receiving provider allocation. Phase 4 executes workloads, streaming meter slices to the control plane. Phase 5 polls settlement status, retrieving signed receipts upon job completion.

Integration examples demonstrate Ray, Modal, and Kubernetes adapters. The Ray integration submits jobs via ray.job_submission.submit, tailing logs for metering signals. The Modal integration wraps Modal functions with VR-ACU reservation context, transparently routing compute through the provider network. The Kubernetes integration generates pod specs with GPU resource requests, applying VR-ACU annotations for cost attribution.

ENTERPRISE FEATURES
Enterprise SDK extensions include: budget guardrails (reject jobs exceeding cost ceilings), cost forecasting (estimate job expenses pre-execution), provider allowlists (restrict to compliant hardware), and audit trail export (CSV/JSON dumps of all reservations, settlements, and receipts).

The CLI tool exposes SDK functionality via terminal commands. vracu reserve initiates reservations, vracu meter ingests manual slices, vracu settle forces settlement, and vracu receipt verifies signatures. Shell completion scripts support bash, zsh, and fish.

Observability

Metrics, Logging, and Distributed Tracing

Production observability leverages Prometheus metrics, structured logging, and distributed tracing to surface system health, performance bottlenecks, and failure modes.

Prometheus metrics export from GET /metrics endpoints across all services. Control plane metrics include: queue depth (async tasks pending), HTTP latency histograms (p50/p95/p99), breaker state (binary open/closed), oracle finalization durations, and settlement batch sizes.

Provider metrics expose: GPU utilization percentages, VRAM allocated/free, job counts (running/queued/failed), heartbeat intervals, and attestation refresh timestamps. Grafana dashboards aggregate fleet-wide statistics, alerting on: utilization < 60% (underutilized), failures > 5% (reliability degradation), heartbeat gaps > 90s (connectivity issues).

Metrics Export
# Prometheus metric definitions vracu_queue_depth = Gauge( 'vracu_queue_depth', 'Pending async tasks in control plane queue' ) vracu_http_duration = Histogram( 'vracu_http_duration_seconds', 'HTTP request duration', ['method', 'path', 'status'] ) vracu_breaker_state = Gauge( 'vracu_breaker_state', 'Circuit breaker state (0=closed, 1=open)', ['name'] ) vracu_settlement_batch_size = Histogram( 'vracu_settlement_batch_size', 'Number of jobs per settlement batch' )

Structured logging emits JSON lines to stdout, ingested by log aggregators (Loki, Elasticsearch). Log entries include: trace IDs (for correlation), log levels (DEBUG/INFO/WARN/ERROR), component names, and contextual metadata (job IDs, provider IDs, transaction hashes).

Distributed tracing via OpenTelemetry instruments HTTP handlers, database queries, and blockchain transactions. Trace spans propagate across service boundaries via W3C Trace Context headers. Jaeger UI visualizes request flows, surfacing latency waterfalls and failure attribution.

SLO DASHBOARDS
Service Level Objectives (SLOs) define success criteria: reservation p95 < 5s, settlement p99 < 30s, provider heartbeat uptime > 99.5%, payment webhook processing < 1s. Dashboards display error budgets (remaining allowable failures per 30-day window) and burn rate trends.

Alerting rules fire on SLO violations, breaker openings, and anomaly detection. PagerDuty integration routes critical alerts to on-call engineers; Slack webhooks notify teams of warnings. Alert fatigue mitigation groups correlated alerts (multiple breakers from same root cause) into single incidents.

As we stand at the threshold of a new era in distributed computing, the implications extend far beyond technical specifications. This infrastructure represents a reimagining of power dynamics in the digital age. No longer must innovators genuflect before the altar of cloud providers. No longer must privacy be sacrificed for performance.

The distributed GPU network is more than infrastructure—it's a manifesto written in code, a declaration of independence from digital feudalism. Each node that joins the network is a vote for decentralization. Each transaction is a small revolution. Each computed result is proof that another world is not only possible but already being built.

Yet challenges remain. The network must scale without compromising its principles. It must remain accessible while resisting capture by special interests. It must evolve while maintaining backward compatibility. These are not merely technical challenges but philosophical ones that will shape the network's future.

"The best time to plant a tree was twenty years ago. The second best time is now. The same holds true for decentralized infrastructure."
Anonymous Node Operator

Looking forward, the trajectory is clear. As artificial intelligence becomes increasingly central to human endeavor, the infrastructure supporting it must reflect our highest values: transparency, equity, privacy, and freedom. The distributed GPU network is not the end of this journey but perhaps its most promising beginning.

In the end, this is a story about choice. The choice to build rather than complain. The choice to collaborate rather than compete. The choice to open source the future rather than patent it. These choices, multiplied across thousands of contributors and millions of computations, constitute nothing less than a peaceful revolution in how we organize computational power.

The revolution will not be centralized.