Technological Review of our Backend

Distributed Systems · Cryptography · Architecture

Futures, Privacy & Adapters:
The Three Pillars

A confidential compute exchange built on three foundational pillars—forward reservations for guaranteed capacity, privacy primitives for hardware-attested security, and adapters for workload abstraction—with enabling infrastructure that makes it all work.

FEATURED INFRASTRUCTURE
╔═══════════════════════════════════════╗
║ FORWARD CONTRACTS FOR GPU COMPUTE ║
║ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ║
║ RC_RESERVE · ESCROW · SETTLEMENT ║
╚═══════════════════════════════════════╝
SECTION 01

Futures Adapter

Forward Reservations, Secondary Markets, and the Economics of Guaranteed Compute Capacity

The futures adapter is an autonomous microservice that mints, settles, and trades forward reservations (RC_Reserve) for GPU compute capacity. It runs alongside the broader platform but is implemented as a strictly additive adapter: it exposes its own FastAPI service, persists to its own SQLite (or cloud RDBMS) schema, and integrates with the existing UI purely via HTTP proxied through platform-server.py.

The adapter treats every provider as a short forward counterparty. Buyers lock compute supply by paying a split fee—commit fee paid immediately to providers, usage fee escrowed until jobs actually burn the reservation—mirroring oil futures logic where spot supply is guaranteed in the future at a fixed price. The service implements zero-collateral architecture: no platform treasury, no provider collateral; solvency emerges from algorithmic capacity ceilings, deterministic curve pricing, and strict ledger debits/credits.

"Like oil futures, compute futures guarantee spot supply at a fixed price—except the commodity is GPU cycles, the delivery is algorithmic, and settlement happens in ACU tokens."
— Futures Adapter
1.1 · Five Core Functions
Function 01

Market Discovery

Compute per-provider, per-tenor curves (capacity, utilization, lock price, fee split) from telemetry and utilization data.

Function 02

Purchase Lifecycle

Accept user quotes, verify capacity/price, debit ACU balances, mint RCs, and escrow usage fees.

Function 03

Execution Coverage

On job completion events, allocate usage against outstanding RCs, credit providers, fall back to spot when futures inventory exhausted.

Function 04

Secondary Trades

Allow RC splits/transfers, create order-book style listings, and expire unused capacity with γ-based refunds.

Function 05

Observability & Verification

Persist detailed tables (stats, reserves, allocations, listings, trades) and ship a sophisticated testing harness that simulates telemetry and user flows at scale.

rc_reserves · rc_stats · futures_listings
futures_trades · job_allocations · job_allocation_details
1.2 · System Architecture

Context Diagram

  ┌─────────────────────┐        HTTPS via platform-server       ┌────────────────────────┐
  │   Browser UI        │  <──────────────────────────────────── │   Flask Proxy          │
  │   (vracu-platform)  │          /api/futures/*                │   (platform-server)    │
  └─────────────────────┘                                        └───────────┬────────────┘
             │                                                                │
             │                                                                │
             ▼                                                                ▼
  ┌──────────────────────────────────────────────────────────────────────────────────────────┐
  │                        FastAPI (services/futures_adapter/app.py)                          │
  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐     │
  │  │   api.py    │  │ service.py  │  │ capacity.py │  │ pricing.py  │  │  ledger.py  │     │
  │  │   Router    │─▶│   Service   │─▶│  Forecast   │─▶│   Curves    │─▶│   Debits    │     │
  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘     │
  │                           │                                                    │          │
  │                           │              SQLAlchemy ORM                        │          │
  │                           └────────────────────────────────────────────────────┘          │
  └──────────────────────────────────────────────────────────────────────────────────────────┘
                                                    │
                                                    ▼
                              ┌──────────────────────────────────────────┐
                              │         Futures DB (SQLite/Postgres)      │
                              │  ┌──────────────┐  ┌──────────────────┐  │
                              │  │ rc_reserves  │  │ account_balances │  │
                              │  ├──────────────┤  ├──────────────────┤  │
                              │  │ rc_stats     │  │ futures_listings │  │
                              │  ├──────────────┤  ├──────────────────┤  │
                              │  │ job_allocs   │  │ futures_trades   │  │
                              │  └──────────────┘  └──────────────────┘  │
                              └──────────────────────────────────────────┘
                
1.3 · Module Inventory
Module Purpose
config.pyEnvironment variables: provider IDs, pricing parameters, reserved fractions, default spot price, ledger backend selection
db.pySQLAlchemy session + ORM declarations for all tables (RCReserveORM, RCStatsORM, AccountBalanceORM, FuturesListingORM, etc.)
models.pyDataclasses representing domain entities returned to service/API layers
schemas.pyPydantic models for FastAPI request/response validation
capacity.pyDeterministic capacity forecasting: translates per-tenor days to SCU capacity via reserved fraction and reliability floor
pricing.pyPricing curves: term premium, utilization premium, commit/usage split; produces lock price per tenor
ledger.pyBalance mutations abstraction; default uses internal account_balances, optional HTTP backend for payments integration
service.pyHeart of the system: market stats, quote/purchase flow, job coverage, expiry, secondary market operations
secondary.pyThin wrappers to orchestrate listing/trade creation
job_hook.pyCLI utility to call /jobs/apply for completed jobs (control plane bridge)
1.4 · Data Model & Schemas

Entity Relationship Diagram

   ┌───────────────────────┐                    ┌───────────────────────┐
   │   account_balances    │                    │       rc_stats        │
   │───────────────────────│                    │───────────────────────│
   │ account_id (PK)       │                    │ provider_id, tenor    │
   │ balance               │                    │ lock_price, fees      │
   │ updated_at            │                    │ capacity, utilization │
   └───────────────────────┘                    └───────────────────────┘

   ┌───────────────────────────────────────────────────────────────────────┐
   │                          rc_reserves (PK: rc_id)                       │
   │───────────────────────────────────────────────────────────────────────│
   │ owner_id │ provider_id │ gpu_profile │ region │ tenor │ expiry_ts    │
   │ max_scu  │ used_scu    │ fee_comm    │ fee_usage │ lock_price       │
   │ escrow_acu │ status (ACTIVE/FULLY_USED/EXPIRED) │ parent_rc_id      │
   └─────────────────────────────────┬─────────────────────────────────────┘
                                     │
              ┌──────────────────────┼──────────────────────┐
              │                      │                      │
              ▼                      ▼                      ▼
   ┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────┐
   │  futures_listings   │  │ job_alloc_details   │  │   futures_trades    │
   │─────────────────────│  │─────────────────────│  │─────────────────────│
   │ listing_id (PK)     │  │ job_id (FK)         │  │ trade_id (PK)       │
   │ rc_id (FK)          │  │ rc_id (FK)          │  │ listing_id (FK)     │
   │ owner_id            │  │ alloc_scu           │  │ rc_id_buyer (FK)    │
   │ scu_available       │  │ fee_usage           │  │ buyer_id, seller_id │
   │ ask_price_acu/scu   │  │ payout_acu          │  │ scu, price          │
   │ status (OPEN/FILLED)│  └─────────────────────┘  └─────────────────────┘
   └─────────────────────┘
              ▲                      ▲
              │                      │
   ┌──────────┴──────────────────────┴──────────┐
   │            job_allocations (PK: job_id)     │
   │─────────────────────────────────────────────│
   │ user_id │ provider_id │ total_scu │ spot_* │
   └─────────────────────────────────────────────┘
                
1.5 · Primary Flows
Flow 1: Quote → Purchase
User requests quote with GPU profile, region, tenor, and quantity. Adapter recomputes stats for candidate providers, sorts offers by lock price, assembles allocations. User confirms purchase; adapter validates drift, debits ACU, credits commit fee to provider, mints RC with escrowed usage fee.

Quote/Purchase Sequence

  User        Proxy        Adapter(API)       FuturesService           Ledger            DB
   │            │              │                    │                    │                │
   │──POST /quote──────────────▶                    │                    │                │
   │            │              │──market()─────────▶│                    │                │
   │            │              │                    │──compute_stats()──▶│                │
   │            │              │                    │   for each provider│                │
   │            │              │                    │◀──────────────────────────read stats─┤
   │            │              │◀──QuoteResponse────│                    │                │
   │◀─────allocations, total_cost, partial──────────│                    │                │
   │                           │                    │                    │                │
   │──POST /purchase───────────▶                    │                    │                │
   │            │              │──purchase()───────▶│                    │                │
   │            │              │                    │──validate drift───▶│                │
   │            │              │                    │──debit(owner)─────▶│                │
   │            │              │                    │◀──────────────────ok│                │
   │            │              │                    │──credit(provider)─▶│  (commit fee)  │
   │            │              │                    │──INSERT RC_Reserve─────────────────▶│
   │            │              │                    │         (escrow_acu = usage fee)    │
   │◀─────PurchaseResponse {reservations[], cost}───│                    │                │
   │                           │                    │                    │                │
                
Pseudocode · Quote Algorithm
function quote(gpu_profile, region, tenor, quantity, provider_ids):
    offers = []
    for pid in provider_ids:
        stat = compute_stats(pid, tenor)
        capacity_avail = remaining_capacity(pid, tenor, stat.notional)
        if capacity_avail > 0:
            offers.append((pid, stat, capacity_avail))
    
    sort offers by stat.lock_price  # cheapest first
    allocations = []
    filled = 0
    
    for offer in offers:
        take = min(offer.capacity_avail, quantity - filled)
        if take <= 0: break
        allocations.append({
            provider_id: offer.pid,
            scu: take,
            lock_price: offer.stat.lock_price,
            fee_comm: offer.stat.fee_comm,
            fee_usage: offer.stat.fee_usage
        })
        filled += take
    
    total_cost = sum(alloc.scu * alloc.lock_price for alloc in allocations)
    return allocations, total_cost, filled < quantity
                
Flow 2: Job Completion & Coverage
When a job completes, control plane posts /jobs/apply. Adapter fetches RCs for user/provider, ordered by expiry (earliest first). Allocates SCU from each RC until job covered or RCs exhausted. Remaining usage bills at spot rate.

Job Coverage Algorithm

┌─────────────────────────────┐
│    Control Plane POST       │
│    /jobs/apply              │
│    {job_id, user, provider, │
│     scu_used, spot_rate}    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Fetch RCs for user/provider│
│  ORDER BY expiry_ts ASC     │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  For each RC:               │
│    alloc = min(remaining,   │
│              rc.available)  │
│    rc.used_scu += alloc     │
│    rc.escrow -= fee_usage   │
│    credit provider          │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  If remaining > 0:          │
│    Bill spot to user        │
│    Credit provider          │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Insert job_allocation      │
│  Insert detail records      │
│  Return coverage summary    │
└─────────────────────────────┘
                    

RC Reserve Lifecycle

         ┌──────────┐
         │   MINT   │
         │ (purchase│
         │  action) │
         └────┬─────┘
              │
              ▼
       ┌──────────────┐
       │    ACTIVE    │◀──────────┐
       │              │           │
       │  max_scu=250 │   secondary
       │  used_scu=0  │   trades add
       │  escrow=full │   child RCs
       └──────┬───────┘           │
              │                   │
       job allocations            │
       consume SCU                │
              │                   │
              ▼                   │
       ┌──────────────┐           │
       │  FULLY_USED  │───────────┘
       │              │   (split)
       │  used >= max │
       │  escrow ~ 0  │
       └──────┬───────┘
              │
       expiry reached
              │
              ▼
       ┌──────────────┐
       │   EXPIRED    │
       │              │
       │  γ refund    │
       │  to user     │
       │  breakage to │
       │  provider    │
       └──────────────┘
                    
1.6 · Expiry & γ Refund Function

Upon hitting expiry, RCs with expired timestamps are processed. The adapter computes remaining SCU (max - used) and remaining escrow (unused usage fees). Utilization is calculated as used/max, then fed into the γ function—a linear interpolation that returns 0 at 0% utilization and 0.7 at 90%+ utilization.

User refund equals remaining_escrow × γ(utilization). Provider breakage equals remaining_escrow − refund. This incentivizes users to maximize utilization while compensating providers for reserved capacity. The RC status transitions to EXPIRED and escrow zeroes out.

Gamma Function & Expiry Logic

  γ (gamma)
    │
 0.7├────────────────────────────●━━━━━━━━━━  (90%+ utilization → max refund)
    │                          ╱
    │                        ╱
    │                      ╱
    │                    ╱
    │                  ╱
    │                ╱
    │              ╱
    │            ╱
    │          ╱
    │        ╱
    │      ╱
    │    ╱
    │  ╱
 0.0├●───────────────────────────────────────  (0% utilization → no refund)
    └─────────────────────────────────────────▶
    0%                   90%              100%   Utilization

  ┌─────────────────────────────────────────────────────────────────────┐
  │  Expiry Calculation:                                                 │
  │                                                                      │
  │    remaining_escrow = rc.escrow_acu                                  │
  │    utilization = rc.used_scu / rc.max_scu                            │
  │    gamma = 0.7 × min(1.0, utilization / 0.9)                         │
  │                                                                      │
  │    refund_user = remaining_escrow × gamma                            │
  │    provider_breakage = remaining_escrow - refund_user                │
  │                                                                      │
  │    ledger.credit(owner, refund_user)                                 │
  │    ledger.credit(provider, provider_breakage)                        │
  │    rc.status = EXPIRED                                               │
  │    rc.escrow_acu = 0                                                 │
  └─────────────────────────────────────────────────────────────────────┘
                
1.7 · Secondary Market Flow

Listing Creation & Trade Execution

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           LISTING CREATION                                   │
  └─────────────────────────────────────────────────────────────────────────────┘

  User A (Seller)                    Adapter                         Database
       │                               │                                │
       │──POST /listings───────────────▶                                │
       │   {rc_id, scu_to_sell, ask}   │                                │
       │                               │──validate owner == rc.owner───▶│
       │                               │──check remaining >= scu────────▶│
       │                               │──INSERT listing────────────────▶│
       │◀──────────ListingResponse─────│                                │
       │                               │                                │

  ┌─────────────────────────────────────────────────────────────────────────────┐
  │                           TRADE EXECUTION                                    │
  └─────────────────────────────────────────────────────────────────────────────┘

  User B (Buyer)       Adapter                    Ledger               Database
       │                 │                          │                     │
       │──POST /trades──▶│                          │                     │
       │  {listing_id,   │──fetch OPEN listing─────────────────────────▶ │
       │   scu_to_buy}   │                          │                     │
       │                 │──verify buyer ≠ seller   │                     │
       │                 │──verify scu available    │                     │
       │                 │                          │                     │
       │                 │──debit(buyer, total)────▶│                     │
       │                 │──credit(seller, total)──▶│                     │
       │                 │                          │                     │
       │                 │──reduce seller_rc.max_scu───────────────────▶ │
       │                 │──create buyer_rc (child)────────────────────▶ │
       │                 │──update listing.scu_available───────────────▶ │
       │                 │──INSERT trade record────────────────────────▶ │
       │                 │                          │                     │
       │◀──TradeResponse─│                          │                     │
       │   {buyer_rc_id} │                          │                     │
                
1.8 · Pricing & Capacity Algorithms
Capacity Model

capacity.py

forecast_capacity = observed_scu_per_hour × 24 × tenor_days
max_notional = forecast × reserved_fraction × reliability_floor
remaining = max(0, max_notional - current_notional)
Pricing Model

pricing.py

term_premium = term_premium_max × (1 - exp(-tenor_days / term_tau_days))
util_premium = util_slope × max(0, utilization - util_target)
lock_price = spot × (1 + term_premium) × (1 + util_premium)
fee_commit = lock_price × commit_fraction[tenor]
fee_usage = lock_price - fee_commit
1.9 · API Surface
Endpoint Method Description
/healthGETReadiness check returns {"status": "ok"}
/marketGETList of RCStats: provider, tenor, lock price, fees, capacity, utilization
/providers/{id}/curveGETTenor-wise curve for provider (UI "Curve" modal)
/quotePOSTRequest quote for GPU profile, region, tenor, quantity
/purchasePOSTFinalize reservations along allocations; returns minted RCs
/portfolio/{user_id}GETCurrent RCs for user (UI "My Portfolio" tab)
/positions/{rc_id}GETRC-level detail with job allocations
/listingsGET/POSTSecondary market listing feed / create listing
/tradesPOSTExecute listing purchase
/jobs/applyPOSTJob completion hook; returns allocation summary
/jobs/{job_id}/allocationsGETDetailed coverage info for a job
/expirePOSTManual expiry sweep (cron/batch)
1.10 · End-to-End Scenario
EXAMPLE WORKFLOW
01
Quote: User requests 250 SCU of H100 compute on provider α, tenor T90D. Adapter calculates remaining capacity (115,000 SCU), yields lock price 11.10 ACU/SCU with fee split (commit 2.78, usage 8.32).
02
Purchase: User confirms. Adapter debits 2,775 ACU (commit) + 2,080 ACU (escrow) = 4,855 ACU total,* immediately credits provider 695 ACU commit, mints RC rc-123 with max_scu=250.
03
Job Execution: Job completes using 180 SCU. Adapter allocates from rc-123, reduces escrow by 180 × 8.32, credits provider usage fee. RC remains ACTIVE with 70 SCU left.
04
Secondary Listing: User lists 60 SCU remainder at 15 ACU/SCU ask price.
05
Trade: Second user buys 40 SCU, paying 600 ACU. Child RC minted for buyer; listing remains with 20 SCU available.
06
Expiry: After 90 days, RC has 10 SCU unused; utilization = 96%. γ(0.96) ≈ 0.7, so user gets 70% of leftover escrow, provider keeps 30% as breakage.
* Actual split depends on configured commit_fraction for T90D tenor
1.11 · SCM/ACU Calibration & Hardware Normalization

SCM (Standard Compute Minutes) is not "one minute on any GPU"—it's one minute on a reference machine with calibrated GFLOPS and bandwidth figures. Different GPUs deliver different SCM/min scores via benchmarking. The scheduler and futures adapter convert between SCM and actual runtime per hardware using those scores.

When a provider onboards, the provider_agent collects hardware attestation plus micro-benchmarks (GEMM FP16/FP32 GFLOPS, memory bandwidth, interconnect throughput). ACURateCalibrator normalizes each metric against reference values and applies weights to produce acurate_scm_per_min: how many standardized compute minutes that hardware delivers per wall-clock minute.

Calibration Pipeline

  ┌──────────────────────────────────────────────────────────────────────────────┐
  │                        PROVIDER ONBOARDING                                    │
  └──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
  ┌──────────────────────────────────────────────────────────────────────────────┐
  │  provider_agent/microbench/                                                   │
  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐          │
  │  │ GEMM FP16   │  │ GEMM FP32   │  │ Mem BW      │  │ Interconnect│          │
  │  │ GFLOPS      │  │ GFLOPS      │  │ GB/s        │  │ Latency     │          │
  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘          │
  └─────────┼────────────────┼────────────────┼────────────────┼─────────────────┘
            │                │                │                │
            └────────────────┴────────────────┴────────────────┘
                                    │
                                    ▼
  ┌──────────────────────────────────────────────────────────────────────────────┐
  │  ACURateCalibrator (provider_agent/calibration/calibrator.py)                 │
  │                                                                               │
  │    weights = { FP16: 55%, FP32: 15%, MemBW: 15%, Interconnect: 10%,          │
  │                Stability: 5% penalty }                                        │
  │                                                                               │
  │    acurate_scm_per_min = weighted_score × reference_normalization             │
  │                                                                               │
  │    H200 → ~1.4 SCM/min  (faster than reference)                               │
  │    H100 → ~1.0 SCM/min  (reference baseline)                                  │
  │    A100 → ~0.7 SCM/min  (slower than reference)                               │
  └──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
  ┌──────────────────────────────────────────────────────────────────────────────┐
  │  control_plane/services.py::record_attestation                                │
  │  ┌────────────────────────────────────────────────────────────────────────┐  │
  │  │  provider_attestations table                                            │  │
  │  │  ───────────────────────────────────────────────────────────────────── │  │
  │  │  provider_id │ hardware_spec │ acurate_scm_per_min │ attestation_ts    │  │
  │  └────────────────────────────────────────────────────────────────────────┘  │
  └──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
  ┌──────────────────────────────────────────────────────────────────────────────┐
  │  FUTURES ADAPTER                                                              │
  │                                                                               │
  │  Capacity model operates on SCU totals—already normalized to SCM.             │
  │  When a job requests 100 SCM, the control plane divides by each provider's   │
  │  acurate_scm_per_min to determine actual runtime. Futures contracts settle   │
  │  in ACU tokens at the SCM-normalized rate.                                   │
  └──────────────────────────────────────────────────────────────────────────────┘
                
"When you buy 100 SCM, you're buying enough standardized compute that—no matter if your job lands on a V100 or H200—the control plane ensures you receive 100 SCM's worth of throughput. Faster GPUs finish sooner; pricing ties to calibrated performance."
— Hardware Normalization
1.12 · Testing & Load Infrastructure
Load Harness

futures_adapter/testing/

Comprehensive testing infrastructure supports realistic multi-provider simulations, stress testing, and UI replay for demonstrations.

Fixtures
providers.yaml (13 providers)
GPU specs & attestation
Telemetry per tenor
Generators
run_load.py (asyncio)
curve_simulator.py
provider_catalog.py
Outputs
purchases.jsonl
jobs.jsonl, trades.jsonl
run_summary.json
Operational Note
The harness supports thousands of synthetic users and hundreds of concurrent workers. Example 15-minute runs produce >1,400 reservations, >5,900 trades, and >138,000 job applications. JSONL artifacts document flows end-to-end for audit and replay.
· · ·
SECTION 02

Privacy Design Principles

PRIMARY PILLAR · Attesters, Brokers, Proofs, Revocation, and Inter-Rank Security

The privacy subsystem lives under vracu-launcher/launcher/privacy/ and is composed of cooperating modules: attesters.py validates hardware claims; service.py orchestrates authorisation and on-demand key issuance; key_broker.py implements various key distribution strategies; revocation.py stores revocation state and background watchers; proof.py loads optional verifiers; interrank.py produces per-rank crypto bundles; and loader.py constructs the system based on configuration. This modular architecture ensures each component handles a specific security domain while sharing the same foundation—typed, deterministic Python code built for confidential compute workloads.

AttestationResult is a frozen dataclass containing valid (bool) and claims (dict). This simple shape allows attesters to return rich structured claims when verification succeeds or a human-readable error when it fails. PrivacyAuthorization stores claims (per-attester claim maps), attestation (flattened dictionary), optional dek bytes, and optional session metadata.

Privacy Module Architecture

launcher/privacy/
├── __init__.py
├── attesters.py          ← Attester protocol + implementations (NVIDIA, TDX, SNP)
│   ├── Attester (Protocol)
│   ├── NvidiaCcOnAttester
│   ├── TdxAttester
│   └── SnpAttester
├── service.py            ← PrivacyGate orchestration
│   ├── PrivacyGate
│   ├── authorize_job()
│   └── issue_session_dek()
├── key_broker.py         ← Key distribution strategies
│   ├── KeyBroker (abstract)
│   ├── SessionKeyBroker (abstract)
│   ├── KeyBrokerAwsKms
│   ├── KeyBrokerVaultTransit
│   ├── KeyBrokerStatic
│   ├── SplitKeyBroker
│   └── HttpSplitKeyShareClient
├── revocation.py         ← Revocation registry + watcher
│   ├── RevocationRegistry
│   ├── RevocationWatcher
│   └── SessionInvalidationPipeline
├── proof.py              ← Proof verifier plugins
│   ├── load_proof_verifier()
│   └── run_proof_verifier()
├── interrank.py          ← Inter-rank cryptography
│   ├── InterRankCryptoConfig
│   └── build()
├── loader.py             ← Configuration-driven construction
│   └── build_privacy_components()
└── errors.py             ← Custom exceptions
    ├── PrivacyViolation
    ├── PrivacyInitializationError
    └── ProofVerificationError
                
2.1 Architectural Orientation

This layered architecture ensures each module handles a specific security domain while sharing the same foundation—typed, deterministic Python code built for confidential compute workloads. The modules cooperate to validate hardware claims, issue cryptographic keys, track sessions, and enforce revocation policies.

2.2 Core Data Structures

AttestationResult is a frozen dataclass containing valid (bool) and claims (dict). PrivacyAuthorization stores claims (per-attester claim maps), attestation (flattened dictionary), optional dek bytes, and optional session metadata. RevocationDelta encapsulates newly revoked attestation hashes and session IDs. SessionState structures track session IDs, salts, attestation hashes, step counters, tokens, and broker state.

"Privacy is not a feature to be added; it is the foundation upon which legitimate distributed computing must be built."
— Privacy Subsystem
2.3 Attester Protocol

Attester is a typing.Protocol with verify(evidence, challenge) returning AttestationResult. Implementations must bind evidence to the challenge (job ID). The protocol enables static type checking and encourages consistent error handling across attesters. Because the protocol returns AttestationResult, attesters never raise exceptions for expected validation failures; they return valid=False with an informative error message.

NVIDIA CC-On

NvidiaCcOnAttester

Enforces SPDM certificate chain validation, challenge binding, CC-On mode requirements, and payload signature verification.

Evidence: spdm_chain, gpu_report
Validates: nonce == challenge
Requires: cc_mode == "CC_ON"
Signature: RSA PSS / ECDSA
Returns: vendor, product_id, measurement
Intel TDX

TdxAttester

Handles Intel Trust Domain Extensions quotes with report, signature, and cert_chain validation.

Evidence: tdx_quote
Validates: nonce, mr_enclave, mr_signer
Signature: SHA-384 verification
Returns: vendor, challenge, attributes
AMD SEV-SNP

SnpAttester

Validates SEV-SNP attestation reports with policy enforcement and VCEK chain verification.

Evidence: snp_report
Validates: nonce, policy
Certificate: PEM, chain verification
Returns: vendor, policy, platform_version
2.4 PrivacyGate Authorization Flow

PrivacyGate.authorize_job iterates over configured attesters. Evidences are accessed by name; missing evidence triggers PrivacyViolation. Each attester's verify method is called with challenge=job_id. If valid is false, PrivacyViolation includes the attester name and error message. Claims are stored in a map keyed by attester name and flattened into flat_claims.

PrivacyGate Authorization Flow

PrivacyGate.authorize_job(job_id, evidence_map)
    │
    ├──▶ FOR attester_name IN configured_attesters:
    │       │
    │       ├──▶ evidence = evidence_map.get(attester_name)
    │       │       │
    │       │       └── IF NOT evidence:
    │       │               RAISE PrivacyViolation("missing evidence")
    │       │
    │       ├──▶ result = attester.verify(evidence, challenge=job_id)
    │       │       │
    │       │       └── IF NOT result.valid:
    │       │               RAISE PrivacyViolation(attester_name, result.error)
    │       │
    │       └──▶ claims[attester_name] = result.claims
    │
    ├──▶ flat_claims = flatten(claims)
    │
    ├──▶ attestation_hash = compute_attestation_hash(flat_claims)
    │
    ├──▶ IF revocation_registry.is_attestation_revoked(attestation_hash):
    │       RAISE PrivacyViolation("attestation revoked")
    │
    ├──▶ IF broker IS SessionKeyBroker:
    │       │
    │       ├──▶ session = broker.create_session(job_id, attestation_hash)
    │       │
    │       └──▶ revocation_registry.track_session(session)
    │
    └──▶ RETURN PrivacyAuthorization(claims, flat_claims, dek, session)
                
2.5 Key Broker Taxonomy

key_broker.py defines a class hierarchy: KeyBroker (abstract), SessionKeyBroker (abstract subclass), KeyBrokerAwsKms, KeyBrokerVaultTransit, KeyBrokerStatic, SplitKeyBroker, and HttpSplitKeyShareClient. Each broker provides asynchronous methods (release or create_session/issue). Retry logic uses exponential_backoff accepting attempts, initial, maximum, and exceptions.

AWS KMS

KeyBrokerAwsKms

Generates data keys using AWS KMS with encryption context binding to job ID and attestation hash.

Method: generate_data_key
Context: job_id + attestation_hash
Retry: exponential_backoff
Key Size: 256-bit
Vault Transit

KeyBrokerVaultTransit

Posts to Vault's transit endpoint for key generation with context binding.

Endpoint: /v1/transit/datakey
Headers: X-Vault-Token
Response: base64-encoded key
Key Size: 256-bit
Static

KeyBrokerStatic

Deterministic key derivation using HKDF for development and testing.

Method: HKDF (HMAC-SHA256)
Info: job_id:attestation_hash
Salt: configurable
Use: development only
Split-Key

SplitKeyBroker

Composes a primary broker with remote share client for threshold key issuance. Enforces monotonic step increments, proof verification, and session state management.

Primary: KeyBrokerAwsKms | KeyBrokerVaultTransit
Remote: HttpSplitKeyShareClient
Combination: XOR(primary_share, remote_share)
Final Key: HKDF(combined, salt=session.salt, info=f"{session_id}:{step}")
Session State: session_id, attestation_hash, salt, last_step, threshold
2.6 Split-Key Issue Flow
Split-Key Issue Flow Pseudo Code
ASYNC FUNCTION issue(job_id, session, attestation, step, proof):
    # Verify attestation hash matches session
    verify_attestation_hash(session["attestation_hash"], attestation)
    
    # Ensure monotonic step progression
    ensure_step_monotonic(session["last_step"], step)
    
    # Run proof verifier if required
    IF require_proof:
        proof_context = run_proof_verifier(job_id, session_id, step, attestation)
    
    # Get share from primary broker (KMS/Vault)
    primary_share = AWAIT primary_broker.release(job_id, attestation, step)
    
    # Get share from remote aggregator
    remote_response = AWAIT splitkey_client.issue_share(session_token, step, proof_context)
    remote_share = base64_decode(remote_response["share_b64"])
    
    # Combine shares via XOR
    final_key_material = xor_bytes(primary_share, remote_share)
    
    # Derive final DEK using HKDF
    dek = hkdf(
        final_key_material, 
        salt=session["salt"], 
        info=f"{session_id}:{step}"
    )
    
    # Update session state
    update_session_state(session, remote_response, step)
    
    RETURN dek, session
                
2.7 Revocation Registry

RevocationRegistry maintains sets of revoked attestation hashes and session IDs plus a dictionary of active sessions. A threading.Lock serialises access. update normalises inputs, determines new entries, updates sets, attaches revoked_at timestamps to tracked sessions, and updates version and updated_at.

Revocation Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                        REVOCATION FEED SOURCE                                │
│                   (HTTP endpoint / local file / control plane)               │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         REVOCATION WATCHER                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  _run() loop:                                                        │    │
│  │    1. sleep(poll_interval)                                           │    │
│  │    2. _fetch_payload() ─▶ HTTP GET or file read                      │    │
│  │    3. _apply_payload() ─▶ validate + update registry                 │    │
│  │    4. on_update() callback ─▶ trigger invalidation pipeline          │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        REVOCATION REGISTRY                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  _revoked_attestations: Set[str]                                     │    │
│  │  _revoked_sessions: Set[str]                                         │    │
│  │  _tracked_sessions: Dict[str, SessionData]                           │    │
│  │  _version: int                                                       │    │
│  │  _updated_at: datetime                                               │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    SESSION INVALIDATION PIPELINE                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  handle(delta: RevocationDelta):                                     │    │
│  │    FOR session_id IN delta.session_ids:                              │    │
│  │        job_id = registry.get_job_id(session_id)                      │    │
│  │        cancel_job(job_id, reason="revoked_session")                  │    │
│  │        IF stop_job: stop_job(job_id, execution_metadata)             │    │
│  │    FOR hash IN delta.attestation_hashes:                             │    │
│  │        sessions = registry.get_sessions_by_attestation(hash)         │    │
│  │        FOR session IN sessions: cancel_job(...)                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                
2.8 Inter-Rank Cryptography

interrank.py delivers cryptographic material for multi-rank workloads. InterRankCryptoConfig accepts algorithm (aes-gcm), key size, nonce size, pad multiple, and Gaussian noise settings. build(world_size) generates handshake ID, per-rank keys, nonces, tags (SHA-256 over handshake ID + rank + key), and zipped contexts.

Inter-Rank Security
Inter-rank cryptography produces environment variables (INTER_RANK_KEY_{rank}, INTER_RANK_NONCE_{rank}) and base64 payload for metadata. This module ensures distributed jobs can encrypt gradients or data exchanged across ranks, enabling secure aggregation in federated training scenarios.
2.9 · Full Attestation Handshake Sequence

Sidecar ↔ Launcher ↔ Attester Handshake

  Sidecar              Launcher API            PrivacyGate           Attesters              Key Broker
     │                      │                      │                    │                       │
     │──POST /v1/jobs/{id}/attestation─────────────▶                    │                       │
     │   {evidences: {...}}  │                      │                    │                       │
     │                      │──authorize_job()─────▶│                    │                       │
     │                      │                      │──verify(nvidia)────▶│                       │
     │                      │                      │◀───AttestResult────│                       │
     │                      │                      │──verify(tdx)───────▶│                       │
     │                      │                      │◀───AttestResult────│                       │
     │                      │                      │                    │                       │
     │                      │                      │──check_revocation()│                       │
     │                      │                      │   (registry lookup)│                       │
     │                      │                      │                    │                       │
     │                      │                      │──create_session()──────────────────────────▶
     │                      │                      │◀──────────session + dek────────────────────│
     │                      │◀─PrivacyAuthorization│                    │                       │
     │◀──200 {session_id, dek_hint}────────────────│                    │                       │
     │                      │                      │                    │                       │
     │                      │                      │                    │                       │
     │══════════════════════│══ JOB EXECUTION ═════│════════════════════│═══════════════════════│
     │                      │                      │                    │                       │
     │──POST /v1/jobs/{id}/rotation────────────────▶                    │                       │
     │   {step: N, proof}   │                      │                    │                       │
     │                      │──issue_session_dek()─▶                    │                       │
     │                      │                      │──verify_step()     │                       │
     │                      │                      │──run_proof()       │                       │
     │                      │                      │──issue()───────────────────────────────────▶
     │                      │                      │◀──────────new_dek──────────────────────────│
     │◀──200 {dek_hint, next_step}─────────────────│                    │                       │
     │                      │                      │                    │                       │
                
2.10 · Key Broker Selection Decision Tree

Broker Selection Logic

settings.privacy.key_broker
          │
          ▼
    ┌─────────────┐
    │ "aws_kms"?  │──YES──▶ KeyBrokerAwsKms
    └──────┬──────┘           │
           │NO                │ generate_data_key()
           ▼                  │ encryption_context
    ┌─────────────┐           │
    │"vault_transit"──YES──▶ KeyBrokerVaultTransit
    └──────┬──────┘           │
           │NO                │ POST /transit/datakey
           ▼                  │
    ┌─────────────┐           │
    │ "static"?   │──YES──▶ KeyBrokerStatic
    └──────┬──────┘           │
           │NO                │ HKDF derivation
           ▼                  │
    ┌─────────────┐           │
    │"split_key"? │──YES──▶ SplitKeyBroker
    └──────┬──────┘           │
           │NO                ├─▶ primary: KMS/Vault
           ▼                  └─▶ remote: HTTP share
    ┌─────────────┐
    │   ERROR     │
    │ InvalidConf │
    └─────────────┘
                    

Split-Key Threshold Crypto

┌─────────────────────────────┐
│     SPLIT-KEY ISSUANCE      │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Primary Broker (KMS/Vault) │
│  ┌───────────────────────┐  │
│  │  share_A = release()  │  │
│  │  (256-bit)            │  │
│  └───────────────────────┘  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  Remote Share Aggregator    │
│  ┌───────────────────────┐  │
│  │  share_B = issue()    │  │
│  │  (256-bit)            │  │
│  └───────────────────────┘  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│        XOR COMBINATION      │
│  combined = share_A ⊕ share_B│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│          HKDF DERIVE        │
│  DEK = HKDF(combined,       │
│        salt=session.salt,   │
│        info=session:step)   │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│    256-bit AES-GCM DEK      │
│    (per-step rotation)      │
└─────────────────────────────┘
                    
2.11 · Revocation Cascade Flow

Detection → Invalidation → Cleanup

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                    REVOCATION CASCADE                                        │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  ┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
  │   DETECTION   │────▶│   REGISTRY    │────▶│ INVALIDATION  │────▶│    CLEANUP    │
  └───────┬───────┘     └───────┬───────┘     └───────┬───────┘     └───────┬───────┘
          │                     │                     │                     │
          ▼                     ▼                     ▼                     ▼
  ┌───────────────┐     ┌───────────────┐     ┌───────────────┐     ┌───────────────┐
  │ RevocationWatcher│  │ registry.update│     │ pipeline.handle│    │ Driver.stop() │
  │ polls source     │  │ (attestations, │     │ (delta)        │     │ Kubernetes    │
  │ every N seconds  │  │  sessions)     │     │                │     │ pod delete    │
  └───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
          │                     │                     │                     │
          │                     │                     │                     │
  ┌───────▼───────┐     ┌───────▼───────┐     ┌───────▼───────┐     ┌───────▼───────┐
  │ HTTP GET      │     │ Set operations│     │ Lookup job_id │     │ Mark job      │
  │ /revocations  │     │ - add hashes  │     │ from session  │     │ CANCELLED     │
  │               │     │ - add sessions│     │               │     │               │
  │ or File read  │     │ - version++   │     │ Cancel job    │     │ Emit metrics  │
  │ revoked.json  │     │ - timestamp   │     │ with reason   │     │ & logs        │
  └───────────────┘     └───────────────┘     └───────────────┘     └───────────────┘
          │                     │                     │                     │
          │                     │                     │                     │
          ▼                     ▼                     ▼                     ▼
  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  Timeline: Detection (0s) → Registry Update (10ms) → Job Cancel (50ms) → Pod Delete (1s)   │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘
                
Component Key File Responsibility
Attesters attesters.py Validate vendor evidence (NVIDIA, TDX, SNP)
Privacy Gate service.py Orchestrate attestation checks, sessions, DEKs
Key Broker key_broker.py Issue secrets via KMS, Vault, static, split-key
Revocation revocation.py Track and apply revocation payloads
Proofs proof.py Execute optional proof verifiers
Inter-rank interrank.py Derive per-rank crypto materials
Invalidation revocation.py Cancel workloads when sessions revoked
· · ·
SECTION 03

Adapter Design Principles

PRIMARY PILLAR · Training, Inference, Quantization, Rendering, Composite, Loader

Adapters convert user intent into the normalised ResourceProfile and ExecutionPlan dataclasses defined in launcher/adapters/base.py. Every job submitted through the API traverses an adapter before it touches persistence. This guarantees that driver-facing specifications (image, command, env, volumes), placement inputs (GPU count, VRAM, interconnect, features), telemetry metadata, and IO descriptors are encoded in a predictable format.

launcher/adapters/loader.py exposes REGISTRY mapping adapter names to classes. load_adapter(name) raises KeyError if the name is absent, preventing silent fallbacks. register_adapter(name, cls) allows runtime extension. Adapters are instantiated on demand so constructor parameters can be supplied by features or tests.

Adapter Flow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                            USER PAYLOAD                                      │
│  {                                                                           │
│    "adapter": "training",                                                    │
│    "spec": {                                                                 │
│      "command": ["torchrun", "train.py"],                                    │
│      "num_gpus": 4,                                                          │
│      "min_vram_gb": 48,                                                      │
│      ...                                                                     │
│    }                                                                         │
│  }                                                                           │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         ADAPTER LOADER                                       │
│  load_adapter("training") ──▶ TrainingAdapter                                │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      ADAPTER.PREPARE(spec)                                   │
│                                                                              │
│  ┌──────────────────────────┐    ┌──────────────────────────┐               │
│  │     ResourceProfile      │    │      ExecutionPlan       │               │
│  │  ────────────────────    │    │  ────────────────────    │               │
│  │  num_gpus: 4             │    │  image: "ghcr.io/..."    │               │
│  │  min_vram_gb: 48         │    │  command: ("torchrun",   │               │
│  │  interconnect: ("nvlink",)│   │           "train.py")    │               │
│  │  scm_minutes: 60         │    │  env: {"ADAPTER": "..."}│               │
│  │  features: ("cuda>=12.1",│    │  strategy: "ddp"         │               │
│  │             "nccl")      │    │  io: {...}               │               │
│  └──────────────────────────┘    └──────────────────────────┘               │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         PERSISTENCE                                          │
│  Job(spec={...}, profile={...}, plan={...}, status=PENDING)                  │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    PLACEMENT / DRIVER / TELEMETRY                            │
│  profile ──▶ placement decisions                                             │
│  plan ──▶ driver.launch()                                                    │
│  adapter.map_metrics() ──▶ telemetry aggregator                              │
└─────────────────────────────────────────────────────────────────────────────┘
                
"The adapter abstraction transforms infrastructure heterogeneity from a liability into an asset—the same workload definition executes across bare metal, cloud VMs, and edge devices."
— Adapter Subsystem
3.1 ResourceProfile Anatomy

ResourceProfile is a frozen dataclass capturing: num_gpus (integer GPU count), min_vram_gb (minimum VRAM per GPU), interconnect (tuple of required interconnects), scm_minutes (scheduled compute minutes for billing), and features (hardware/software feature flags like "cuda>=12.1", "nccl", "nvenc").

3.2 ExecutionPlan Anatomy

ExecutionPlan includes: image (container image), command (tuple of arguments), env (environment variables), volumes, strategy ("ddp", "service", "single", "tiling", "composite"), rendezvous, io descriptor, metadata, and Kubernetes-specific fields (labels, annotations, service_account, restart_policy, replicas, service_ports, probes, autoscaling).

Training

TrainingAdapter

Prepares distributed training jobs with multi-GPU coordination strategies. Sets defaults for DDP/FSDP rendezvous, volume mounts for datasets, and priority metadata.

Defaults: 1 GPU, 40 GB VRAM, NVLink, 60 SCM minutes
Features: ("cuda>=12.1", "nccl")
Strategy: "ddp"
IO Mode: checkpoint
Metrics: step, loss, throughput
Inference

InferenceAdapter

Targets long-running model serving with health probes, autoscaling, and load balancer exposure.

Defaults: 1 GPU, 16 GB VRAM, PCIe
Strategy: "service"
Probes: readiness, liveness
Autoscaling: HPA support
Metrics: latency_p95_ms, QPS, error_rate
Quantization

QuantizationAdapter

Handles model compression with PTQ and QAT modes, different resource profiles per mode.

PTQ: 1 GPU, 45 SCM, strategy="single"
QAT: 2 GPUs, 120 SCM, strategy="ddp"
Output: onnx, tensorrt
Metrics: step, loss, accuracy
Rendering

RenderingAdapter

Manages visual workloads with frames, tiles, resolution, and NVENC requirements.

Inputs: frames, tiles_per_frame, resolution
Features: ("nvenc") when required
Strategy: "tiling"
Metrics: frames_rendered, tiles_completed
Composite

CompositeAdapter

Chains multiple adapter stages sequentially. Validates stages, merges resource profiles, generates orchestration script.

Input: stages[] with name, adapter, spec
Merges: max GPUs, max VRAM, sum SCM minutes, union features
Environment: COMPOSITE_STAGE_COUNT, STAGE_n_NAME
Metrics: stage, step, loss, throughput
Adapter Registry and Plugin System
# Core registry with built-in adapters
_ADAPTER_REGISTRY: Dict[str, AdapterFactory] = {
    "training": TrainingAdapter,
    "inference": InferenceAdapter,
    "render": RenderingAdapter,
    "quant": QuantizationAdapter,
    "composite": CompositeAdapter,
}

# Third-party registration (zero core changes)
def register_adapter(name: str, factory: AdapterFactory) -> None:
    _ADAPTER_REGISTRY[name.lower()] = factory

# Load adapter by name
def load_adapter(name: str, **options) -> Adapter:
    if name.lower() not in _ADAPTER_REGISTRY:
        raise KeyError(f"Unknown adapter: {name}")
    return _ADAPTER_REGISTRY[name.lower()](**options)

# Example: Custom fine-tuning adapter
class FineTuneAdapter(Adapter):
    def prepare(self, job_spec):
        profile = ResourceProfile(
            num_gpus=job_spec.get("num_gpus", 1),
            min_vram_gb=40,
            features=("cuda>=12.1", "peft", "bitsandbytes")
        )
        plan = ExecutionPlan(...)
        return profile, plan

# Register without platform redeployment
register_adapter("finetune", FineTuneAdapter)
                

Adapter-to-Driver Mapping

┌─────────────────────────────────────────────────────────────────────────────┐
│                         ADAPTER PREPARE                                      │
│                                                                              │
│  TrainingAdapter.prepare(spec)                                               │
│      │                                                                       │
│      ├──▶ ResourceProfile                                                    │
│      │       num_gpus: 4                                                     │
│      │       min_vram_gb: 48                                                 │
│      │       interconnect: ("nvlink",)                                       │
│      │       features: ("cuda>=12.1", "nccl")                                │
│      │                                                                       │
│      └──▶ ExecutionPlan                                                      │
│              image: "ghcr.io/example/train:v2"                               │
│              command: ("torchrun", "--nproc_per_node=4", "train.py")         │
│              strategy: "ddp"                                                 │
│              env: {"ADAPTER": "training", "WORLD_SIZE": "4"}                 │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       PLACEMENT PLANNER                                      │
│                                                                              │
│  ResourceProfile ──▶ MultiProviderPlacementPlanner                           │
│      │                                                                       │
│      ├── GPU count ──▶ rank distribution                                     │
│      ├── interconnect ──▶ provider selection                                 │
│      └── features ──▶ capability matching                                    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         KUBERNETES DRIVER                                    │
│                                                                              │
│  ExecutionPlan ──▶ driver.launch()                                           │
│      │                                                                       │
│      ├── image ──▶ container spec                                            │
│      ├── command ──▶ container args                                          │
│      ├── env ──▶ environment variables                                       │
│      ├── strategy ──▶ Job vs Deployment                                      │
│      └── probes ──▶ readiness/liveness                                       │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      TELEMETRY AGGREGATOR                                    │
│                                                                              │
│  Adapter.map_metrics(raw_data) ──▶ normalized metrics                        │
│      │                                                                       │
│      └── {"step": 100, "loss": 0.42, "throughput": 1234.5}                   │
└─────────────────────────────────────────────────────────────────────────────┘
                
Adapter Key Fields Strategy Metrics
Training num_gpus, strategy, datasets ddp step, loss, throughput
Inference replicas, autoscaling, probes service latency_p95_ms, QPS, error_rate
Quantization mode, output_format, precision single/ddp step, loss, accuracy
Rendering frames, tiles, resolution tiling frames_rendered, tiles_completed
Composite stages[] composite stage, step, loss
· · ·

Enabling Infrastructure

The systems that power Futures, Privacy & Adapters

SECTION 04

Configuration Design Principles

Typed Surfaces and Operational Controls

Pydantic settings hierarchies, environment variable mapping, feature toggles, and configuration validation patterns that enable type-safe deployments across all infrastructure layers.

+
SECTION 04

Configuration Design Principles

ENABLING INFRASTRUCTURE · Typed Surfaces and Operational Controls

launcher/config/settings.py defines LauncherSettings, a Pydantic model that aggregates subordinate models: APISettings, FeatureSettings, PrivacySettings, DriverSettings, ObservabilitySettings, SecuritySettings, ControlPlaneSettings, and StorageSettings. Each submodel pulls values from environment variables prefixed with LAUNCHER_. Defaults are sensible but conservative: the API binds to 0.0.0.0:8080, rate limits default to 100/minute, observability exports JSON logs, and privacy requires at least one attester.

The API application entrypoint invokes load_settings(), storing the resulting object in FastAPI's state. Middlewares, routers, and background tasks receive the same object via dependencies. Workers reuse the same config by calling load_settings_cached when booting. Because Pydantic caches environment values, repeated loads remain deterministic.

4.1 Launcher Settings Hierarchy

LauncherSettings aggregates all subordinate configuration models into a single typed surface. Each submodel pulls values from environment variables prefixed with LAUNCHER_. The settings module includes helper constructors (load_settings()) and caches to ensure a single configuration object is reused across the process.

Configuration Hierarchy

LauncherSettings
├── APISettings
│   ├── host: str = "0.0.0.0"
│   ├── port: int = 8080
│   └── workers: int = 4
├── FeatureSettings
│   ├── enable_multi_provider_jobs: bool
│   ├── enable_revocation_watcher: bool
│   ├── enable_revocation_stop: bool
│   ├── enable_policy_engine: bool
│   ├── enable_artifact_encryption: bool
│   ├── enable_session_replay_protection: bool
│   └── enable_composite_jobs: bool
├── PrivacySettings
│   ├── attesters: List[str]
│   ├── key_broker: str
│   ├── split_key_threshold: int
│   ├── proof_verifier: str | None
│   └── revocation: RevocationSettings
├── DriverSettings
│   ├── backend: "kubernetes" | "simulation"
│   ├── namespace: str
│   └── service_account: str
├── ObservabilitySettings
│   ├── log_format: str = "json"
│   ├── traces_enabled: bool
│   └── metrics_enabled: bool
├── SecuritySettings
│   ├── rate_limit: str = "100/minute"
│   ├── admin_token: str
│   └── trusted_origins: List[str]
├── ControlPlaneSettings
│   ├── enabled: bool
│   ├── base_url: str
│   ├── api_key: str
│   └── timeout_seconds: float
└── StorageSettings
    ├── artifact_path: str
    ├── encryption_key: str
    └── s3_bucket: str | None
                
4.2 Configuration Loading Semantics

The API application entrypoint invokes load_settings(), storing the resulting object in FastAPI's state. Middlewares, routers, and background tasks receive the same object via dependencies defined in launcher/api/dependencies.py. Workers reuse the same config by calling load_settings_cached when booting from launcher/worker/main.py.

4.3 Feature Toggles and Experimentation Flags

FeatureSettings toggles govern major behaviours: enable_multi_provider_jobs, enable_revocation_watcher, enable_revocation_stop, enable_policy_engine, enable_artifact_encryption, enable_session_replay_protection, and enable_composite_jobs. Each flag is consulted at multiple call sites. Tests rely on these toggles to simulate different deployment profiles.

Privacy Configuration

PrivacySettings

Describes attesters, key broker, optional split-key parameters, proof verifier module references, and revocation configuration.

attesters: List[str]
key_broker: "static" | "aws_kms" | "vault_transit" | "split_key"
split_key_threshold: int = 2
split_key_endpoint: str | None
split_key_participants: List[str] | None
proof_verifier: str | None
revocation.enabled: bool
revocation.source_url: str | None
revocation.poll_interval_seconds: float = 60.0
Driver Settings

DriverSettings

Specifies default driver backend, container registry overrides, and namespace names.

backend: "kubernetes" | "simulation"
namespace: str
service_account: str
artifact_encryption: bool
Control Plane

ControlPlaneSettings

Exposes connection parameters for control plane integration.

enabled: bool
base_url: str
api_key: str
timeout_seconds: float
reservation_timeout: float
Observability

ObservabilitySettings

Controls logging format, tracing, and metrics exposition.

log_format: "json" | "text"
traces_enabled: bool
otlp_endpoint: str | None
metrics_enabled: bool
Sidecar Configuration

SidecarSettings

Maps environment variables for provider-side runtime including attestation, rotation, and TLS parameters.

SIDECAR_JOB_ID · SIDECAR_PROVIDER_ID · SIDECAR_LAUNCHER_URL · SIDECAR_TOKEN
SIDECAR_CHALLENGE · SIDECAR_ROTATION_SECRET · SIDECAR_ROTATION_DUE_AT
SIDECAR_TLS_REQUIRED · SIDECAR_ARTIFACT_PATH · SIDECAR_STEP_SIGNAL_PATH
SIDECAR_ROTATION_GRACE_SECONDS · SIDECAR_ROTATION_RETRY_DELAY
4.4 Privacy and Key Broker Configuration

PrivacySettings describe attesters (attesters list), key broker (key_broker string), optional split-key parameters (split_key_threshold, split_key_endpoint, split_key_participants), proof verifier module references (proof_verifier), and revocation configuration. The revocation section includes enabled, source_url, source_path, poll_interval_seconds, timeout_seconds, and TLS settings.

4.5 Driver and Artifact Settings

DriverSettings specify default driver backend ("kubernetes" or "simulation"), container registry overrides, namespace names, service account names, artifact encryption toggles, and file system locations for staging. These settings impact packager behaviour and driver manifests.

Configuration Propagation Flow

Environment Variables / Secrets Manager
         │
         ├─────────────────────────────────────────────────────────────────────┐
         │                                                                     │
         ▼                                                                     ▼
┌─────────────────────────┐                                   ┌─────────────────────────┐
│ launcher/config/        │                                   │ sidecar/config/         │
│   settings.py           │                                   │   settings.py           │
│                         │                                   │                         │
│ load_settings() ───────────────────────────────────────────▶│ SidecarSettings         │
│   │                     │                                   │   │                     │
│   ├─▶ APISettings       │                                   │   ├─▶ job_id            │
│   ├─▶ PrivacySettings   │                                   │   ├─▶ launcher_url      │
│   ├─▶ DriverSettings    │                                   │   ├─▶ rotation_secret   │
│   └─▶ ...               │                                   │   └─▶ tls_required      │
└─────────┬───────────────┘                                   └─────────────────────────┘
          │
          ├─▶ create_app() ──▶ FastAPI state, middlewares, routers
          │
          ├─▶ LauncherService ──▶ driver registry, artifact storage
          │
          └─▶ JobProcessor ──▶ multi-provider strategies, control-plane hooks

┌─────────────────────────┐                                   ┌─────────────────────────┐
│ control_plane/config.py │                                   │ payments/stripe_service │
│   ServiceConfig         │                                   │   /config.py            │
│                         │                                   │                         │
│   ├─▶ database          │                                   │   ├─▶ stripe_api_key    │
│   ├─▶ oracle            │                                   │   ├─▶ webhook_secret    │
│   ├─▶ scheduler         │                                   │   ├─▶ queue_url         │
│   ├─▶ metering          │                                   │   └─▶ ledger_db_url     │
│   ├─▶ settlement        │                                   │                         │
│   ├─▶ chain             │                                   └─────────────────────────┘
│   ├─▶ resilience        │
│   └─▶ governance        │
└─────────────────────────┘
                
4.6 Control Plane ServiceConfig

control_plane/config.py's ServiceConfig aggregates numerous sub-configs: database, oracle, scheduler, metering, settlement, chain, resilience, regional, signing, governance, enterprise, optimizer, and attestation. Each sub-config defines typed fields with defaults and validation.

4.7 Payments Configuration

payments/stripe_service/config.py defines a Settings model with stripe_api_key, stripe_endpoint_secret, webhook_rate_limit, queue_url, queue_batch_size, queue_wait_seconds, region_name, ledger_database_url, currency, enable_queue, and log_level.

Validation and Fail-Fast Strategy
Configuration validation happens on start-up. Attesters require evidence names that match configuration. Key brokers check required fields: AWS KMS insists on key_id, Vault requires transit_path and token, split-key requires consistent participant lists. PrivacyGate raises ValueError if no attesters are configured. These checks prevent misconfigured deployments from running with degraded security.
"Typed configuration surfaces power every subsystem. Fail-fast validation, observability toggles, and extensive documentation ensure configuration remains a pillar rather than an afterthought."
— Configuration System
4.8 · Settings Propagation Flow

API → Worker → Driver Configuration Chain

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                              CONFIGURATION PROPAGATION                                       │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  Environment                 Application                   Runtime Components
  Variables                   Startup                       (Injected Settings)
      │                          │                                │
      │                          │                                │
      ▼                          ▼                                ▼
  ┌──────────────┐          ┌──────────────┐              ┌──────────────────────────────────┐
  │ LAUNCHER_*   │─────────▶│ load_settings│─────────────▶│ FastAPI State                    │
  │ PRIVACY_*    │          │   ()         │              │ ├─ app.state.settings            │
  │ DRIVER_*     │          │              │              │ └─ Dependency injection          │
  │ CONTROL_*    │          │ Pydantic     │              │                                  │
  └──────────────┘          │ validation   │              │ LauncherService                  │
                            └──────┬───────┘              │ ├─ adapter_options               │
                                   │                      │ ├─ policy_engine                 │
                                   │                      │ └─ quota_enforcer                │
                                   │                      │                                  │
                                   │                      │ PrivacyGate                      │
                                   ▼                      │ ├─ attesters[]                   │
  ┌──────────────┐          ┌──────────────┐              │ ├─ key_broker                    │
  │ AWS Secrets  │─────────▶│ Secret       │              │ └─ revocation_registry           │
  │ Manager      │          │ Resolution   │              │                                  │
  │ /ssm/params  │          │              │              │ JobProcessor (Worker)            │
  └──────────────┘          └──────┬───────┘              │ ├─ driver_backend                │
                                   │                      │ ├─ telemetry_aggregator          │
                                   │                      │ └─ placement_strategy            │
                                   ▼                      │                                  │
  ┌──────────────┐          ┌──────────────┐              │ KubernetesDriver                 │
  │ Config File  │─────────▶│ File Loader  │              │ ├─ namespace                     │
  │ launcher.yaml│          │ (optional)   │              │ ├─ service_account               │
  └──────────────┘          └──────────────┘              │ └─ image_pull_secrets            │
                                                          └──────────────────────────────────┘
                
4.9 · Feature Toggle Decision Tree

Runtime Feature Evaluation

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                              FEATURE TOGGLE EVALUATION                                       │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

                                    ┌─────────────────────┐
                                    │   Job Submission    │
                                    │   (POST /v1/jobs)   │
                                    └──────────┬──────────┘
                                               │
                   ┌───────────────────────────┼───────────────────────────┐
                   │                           │                           │
                   ▼                           ▼                           ▼
  ┌─────────────────────────────┐ ┌─────────────────────────────┐ ┌─────────────────────────────┐
  │ enable_multi_provider_jobs? │ │ enable_revocation_watcher?  │ │ enable_artifact_encryption? │
  └──────────────┬──────────────┘ └──────────────┬──────────────┘ └──────────────┬──────────────┘
                 │                               │                               │
         ┌───────┴───────┐               ┌───────┴───────┐               ┌───────┴───────┐
         │YES            │NO             │YES            │NO             │YES            │NO
         ▼               ▼               ▼               ▼               ▼               ▼
  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
  │ Multi-rank  │ │ Single      │ │ Start       │ │ Skip        │ │ Encrypt     │ │ Plain       │
  │ placement   │ │ provider    │ │ watcher     │ │ watcher     │ │ artifacts   │ │ artifacts   │
  │ strategy    │ │ only        │ │ background  │ │             │ │ w/ DEK      │ │             │
  └─────────────┘ └─────────────┘ └──────┬──────┘ └─────────────┘ └─────────────┘ └─────────────┘
                                         │
                                         ▼
                               ┌─────────────────────┐
                               │ enable_revocation_  │
                               │ stop?               │
                               └──────────┬──────────┘
                                          │
                                  ┌───────┴───────┐
                                  │YES            │NO
                                  ▼               ▼
                           ┌─────────────┐ ┌─────────────┐
                           │ Stop jobs   │ │ Cancel only │
                           │ on revoke   │ │ (no pod     │
                           │ (pod delete)│ │ delete)     │
                           └─────────────┘ └─────────────┘
                
Launcher Settings Loader
from pydantic import BaseSettings

class PrivacyRevocationSettings(BaseSettings):
    enabled: bool = False
    source_url: str | None = None
    source_path: str | None = None
    poll_interval_seconds: float = 60.0
    timeout_seconds: float = 10.0

class PrivacySettings(BaseSettings):
    attesters: list[str]
    key_broker: str = "static"
    split_key_enabled: bool = False
    split_key_threshold: int = 2
    split_key_endpoint: str | None = None
    proof_verifier: str | None = None
    revocation: PrivacyRevocationSettings = PrivacyRevocationSettings()

class LauncherSettings(BaseSettings):
    api: APISettings = APISettings()
    features: FeatureSettings = FeatureSettings()
    privacy: PrivacySettings
    driver: DriverSettings = DriverSettings()
    observability: ObservabilitySettings = ObservabilitySettings()
    security: SecuritySettings = SecuritySettings()
    control_plane: ControlPlaneSettings = ControlPlaneSettings()
    storage: StorageSettings = StorageSettings()

# Environment variables populate fields
settings = LauncherSettings()
                
Configuration Block Key File Purpose
LauncherSettings launcher/config/settings.py Aggregates all launcher configuration
SidecarSettings sidecar/config/settings.py Provider-side runtime configuration
ServiceConfig control_plane/config.py Control plane services configuration
Settings payments/stripe_service/config.py Payments and Stripe integration
SECTION 05

Overture: System Design Principles

Mapping the Contours of a Confidential Compute Exchange

Directory semantics, dependency flows, persistence layers, and the high-level control cycle that orchestrates jobs from SDK submission to on-chain settlement.

+
SECTION 05

Overture: System Design Principles

ENABLING INFRASTRUCTURE · Mapping the Contours of a Confidential Compute Exchange

The repository at GPU-LAYER-DECENTRALISED resembles a densely populated city whose districts map directly to production responsibilities. vracu-launcher/ houses application facing APIs, worker orchestration, adapters, privacy layers, packagers, and placement planners. sidecar/ contains the runtime that executes on provider hosts, including attestation producers, rotation loops, TLS handlers, and configuration loaders. control_plane/ embeds the economic and scheduling core. payments/ owns Stripe ingestion, ledger persistence, SQS queues, and Arbitrum wallet integration.

contracts/ stores Solidity code (notably ConversionRouter.sol) plus supporting ABIs and Foundry configurations. phase4_sdk/ gives client libraries and Typer CLI wrappers. The alien-* families represent provider onboarding, resilience, and observability control planes. Operational scripts live under deployment/, ops/, and infra/. Validation evidence resides in numerous Markdown and PDF files, ranging from system overviews to production evidence reports. Each directory includes __init__.py or configuration files, demonstrating that this structure is not incidental—it is the result of iterative deployments.

Repository Structure Overview

GPU-LAYER/
├── vracu-launcher/
│   ├── launcher/api/                 ← FastAPI routes, dependencies, middleware
│   ├── launcher/config/              ← Pydantic settings, secrets wiring
│   ├── launcher/adapters/            ← Training, inference, quantization, rendering, composite
│   ├── launcher/worker/              ← JobProcessor, queue consumers, multi-provider logic
│   ├── launcher/privacy/             ← Attesters, key brokers, revocation, proof plugins
│   ├── launcher/placement/           ← Strategy compilation, rank commitments, planners
│   ├── launcher/packager/            ← Artifact materialisation, encryption utilities
│   └── launcher/observability/       ← Metrics, SLOs, telemetry aggregator
├── sidecar/
│   ├── runtime/                      ← Attestation handshake, rotation loops, TLS writers
│   └── attestation/                  ← Evidence producers for NVIDIA CC-On, TDX, SNP
├── control_plane/                    ← API server, services, oracle, scheduler, governance
├── payments/                         ← Ledger DAO, Stripe routers, SQS queue workers, ACU wallet
├── contracts/                        ← ConversionRouter.sol, ABIs, Foundry config
├── phase4_sdk/                       ← ControlPlaneClient, Typer CLI, configuration helpers
├── alien-*                           ← Directory API, provider node, observability, resilience
├── provider_agent/                   ← MIG manager, diagnostics collectors
├── deployment/, ops/, infra/         ← Helm charts, Terraform, Kyverno policies, scripts
└── docs/, PDFs, Markdown reports     ← Architecture briefs, evidence, governance policies
                
5.1 Repository Structure and Directory Semantics

The directory structure illustrates disciplined engineering where source, infrastructure, operations, and validation co-reside, ready for continuous inspection. vracu-launcher/ houses application facing APIs, worker orchestration, adapters, privacy layers, packagers, and placement planners. sidecar/ contains the runtime that executes on provider hosts, including attestation producers, rotation loops, TLS handlers, and configuration loaders. control_plane/ embeds the economic and scheduling core.

5.2 Source Relationships and Dependency Direction

Dependencies flow from outer surfaces to inner utilities. FastAPI entry points in launcher/api/app.py import Pydantic settings from launcher/config/settings.py, which rely on helper modules in launcher/utils. Workers leverage adapters, strategies, and placement logic. Privacy components depend on cryptographic helpers and attester definitions. Control-plane clients mirror SDK structures ensuring the launcher and external clients use the same typed requests.

Dependency Flow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           EXTERNAL CLIENTS                                   │
│                    phase4_sdk/client.py  ←→  Typer CLI                       │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │ HTTP/JSON
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         LAUNCHER API LAYER                                   │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ launcher/api/   │───▶│ launcher/core/  │───▶│ launcher/utils/ │         │
│  │   app.py        │    │   service.py    │    │   coerce.py     │         │
│  │   routes/       │    │                 │    │   retry.py      │         │
│  └────────┬────────┘    └────────┬────────┘    └─────────────────┘         │
│           │                      │                                          │
│           ▼                      ▼                                          │
│  ┌─────────────────┐    ┌─────────────────┐                                 │
│  │ launcher/       │    │ launcher/       │                                 │
│  │   adapters/     │    │   privacy/      │                                 │
│  │   loader.py     │    │   service.py    │                                 │
│  └─────────────────┘    └─────────────────┘                                 │
└─────────────────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         WORKER / DRIVER LAYER                                │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ launcher/worker │───▶│ launcher/driver │───▶│ sidecar/runtime │         │
│  │   processor.py  │    │   kubernetes.py │    │   main.py       │         │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘         │
└─────────────────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       CONTROL PLANE / PAYMENTS                               │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐         │
│  │ control_plane/  │◀──▶│ payments/       │◀──▶│ contracts/      │         │
│  │   api_server.py │    │   ledger/       │    │   Router.sol    │         │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘         │
└─────────────────────────────────────────────────────────────────────────────┘
                
5.3 Persistence Layer Overview

Persistence is handled primarily via SQLAlchemy models in launcher/db/models.py, payments/ledger/models.py, and alien-directory-api/directory_api/db. SQLite databases are committed as artifacts to prove real test runs. Migrations exist for directory API (Alembic) and ledger (SQLAlchemy's metadata declarations). Control plane optionally uses Postgres, while ledger DAO can operate on Postgres or SQLite.

Launcher DB

Job Persistence

SQLAlchemy models store job specs, status transitions, privacy handshakes, and telemetry snapshots.

Location: launcher/db/models.py
Tables: jobs, privacy_sessions, artifacts
Backend: SQLite / Postgres
Payments DB

Ledger Persistence

Payment credits, provider payouts, and Stripe event records with full audit trail.

Location: payments/ledger/models.py
Tables: payment_credits, payouts
Backend: SQLite / Postgres
Directory DB

Provider Registry

Provider metadata, join tokens, heartbeats, and capability manifests.

Location: alien-directory-api/db
Tables: providers, tokens, heartbeats
Migrations: Alembic
5.4 Entrypoints from Client to Launcher

Clients use phase4_sdk/client.py or CLI equivalents to submit workloads. The SDK constructs HTTP requests with JSON payloads, attaches X-API-Key, and handles retries. On the launcher side, launcher/api/routes/jobs.py exposes POST /v1/jobs, which validates request bodies against Pydantic models, chooses the appropriate adapter via launcher/adapters/loader.py, persists job specs, and enqueues work onto asynchronous queues.

Entrypoint Architecture
Background workers boot via launcher/worker/main.py, establishing database sessions and instantiating JobProcessor. The processor then executes the canonical lifecycle: load job, compile strategy, plan placement, enforce privacy, materialize artifacts, and launch via driver. Each step corresponds to specific files: _process_privacy (service.py), _apply_rank_attestation (placement/rank_attestation.py), materialize_artifacts (packager/materialize.py), and driver orchestration.
5.5 Control Flow and Exception Surfaces

JobProcessor handles exceptions deliberately. Failed privacy raises PrivacyViolation, recorded via structured logs and metrics. Placement mismatches log PlacementError, while control plane reservation failures mark jobs as ABORTED. Consistent exception architecture means the same types appear across API responses, worker logs, and observability dashboards.

Exception Handling Architecture

JobProcessor.process()
    │
    ├──▶ _process_privacy()
    │         │
    │         ├── PrivacyViolation ──▶ job.status = FAILED
    │         │                        record_attestation_failure()
    │         │
    │         └── Success ──▶ continue
    │
    ├──▶ _apply_placement()
    │         │
    │         ├── PlacementError ──▶ log warning
    │         │                      attempt fallback
    │         │
    │         └── Success ──▶ continue
    │
    ├──▶ _reserve_capacity()
    │         │
    │         ├── ControlPlaneError ──▶ job.status = ABORTED
    │         │                         reason = "reservation_failed"
    │         │
    │         └── Success ──▶ continue
    │
    └──▶ driver.launch()
              │
              ├── DriverError ──▶ job.status = FAILED
              │                   driver.cleanup()
              │
              └── Success ──▶ job.status = RUNNING
                
5.6 Inter-Service Communication Narrative

Once a job passes privacy, the launcher may publish demand to the control plane. The control plane API server authenticates via X-API-Key, parses JSON into typed dataclasses (ReservationRequest, DemandConfig), and interacts with ControlPlaneContext. The context references PriceIndexOracleService, VRACUScheduler, MeteringService, ResilienceGuards, and optionally ArbitrumContracts.

5.7 Provider Journey and Sidecar Interaction

Providers register with the directory API through join tokens issued by operators or the control plane. Onboarding includes publishing provider metadata, hardware capabilities, and heartbeats. When a job is assigned, the sidecar runtime fetches its configuration, contacts the launcher to submit attestation, and retrieves rotation secrets and TLS artifacts. The sidecar's rotation loop ensures DEKs are refreshed before expiry.

Provider Onboarding

┌─────────────┐
│ Join Token  │
│   Issued    │
└──────┬──────┘
       ▼
┌─────────────┐
│  Provider   │
│  Register   │
└──────┬──────┘
       ▼
┌─────────────┐
│  Heartbeat  │
│    Loop     │
└──────┬──────┘
       ▼
┌─────────────┐
│   Active    │
│  Provider   │
└─────────────┘
                    

Sidecar Lifecycle

┌─────────────┐
│   Config    │
│   Loaded    │
└──────┬──────┘
       ▼
┌─────────────┐
│ Attestation │
│  Handshake  │
└──────┬──────┘
       ▼
┌─────────────┐
│  Rotation   │
│    Loop     │
└──────┬──────┘
       ▼
┌─────────────┐
│  Workload   │
│  Execute    │
└─────────────┘
                    
"The directory structure illustrates disciplined engineering where source, infrastructure, operations, and validation co-reside, ready for continuous inspection."
— Launcher Subsystem
5.8 Observability Assets and Documentation Corpus

Observability is multi-pronged. launcher/observability/slo.py exports Prometheus metrics that feed dashboards. payments/stripe_service/metrics.py tracks queue lengths, webhook failures, and ledger states. control_plane/metrics.py monitors HTTP requests, queue depths, and scheduler backlogs. alien-observability offers provider-side exporters.

5.9 High-Level Control Cycle
Control Cycle Pseudo Code
FUNCTION lifecycle(job_id, provider_endpoint):
    WITH db_session() AS session:
        job = session.get(Job, job_id)
        spec = dict(job.spec or {})
        profile = ResourceProfile(**spec["profile"])
        plan = ExecutionPlan(**spec["plan"])

        IF strategy:
            compiled = strategy.compile(profile, plan)
            spec["strategy"] = strategy_payload(compiled)
            IF settings.features.enable_multi_provider_jobs AND placement:
                placement_map = placement.place(provider_endpoint.capabilities, compiled)
                spec["placement"] = serialise(placement_map)

        record_status(job, JobStatus.PROFILING)
        record_status(job, JobStatus.ALLOCATING)

        IF control_plane_client AND spec.policy:
            publish_demand(job_id, spec.policy, offers)
            reservation = reserve_capacity(job_id, provider_endpoint, profile, spec.policy, offers)
            IF reservation IS False:
                abort(job, JobStatus.ABORTED, reason="reservation_failed")
                RETURN
            ELSE IF reservation IS dict:
                spec.setdefault("control_plane", {}).setdefault("reservation", {}).update(reservation)

        abort_reason = process_privacy(session, job, spec)
        IF abort_reason:
            RETURN

        apply_rank_attestation(job, provider_endpoint, spec, profile, plan, compiled)
        materialize_artifacts(job_id, plan)

        launch_result = driver.launch(job_id, LaunchSpec(profile, plan, distribution), provider_endpoint)
        record_status(job, launch_result.status)
        job.spec = spec
        flag_modified(job, "spec")
        session.add(job)

    IF reservation:
        complete_reservation(reservation)
    RETURN {"job_id": job_id, "status": launch_result.status}
                
5.10 Evidence Trail and Observability Bindings

Telemetry flows through multiple channels: Prometheus metrics via launcher/observability/slo.py, structured logs via the standard logging module, tracing via OpenTelemetry, and ledger reports via payments/stripe_service/metrics.py. Evidence artifacts prove that integration tests, staging runs, and production cutovers were executed and recorded.

5.11 Scenario: From SDK to Execution

Consider a developer using the SDK: they configure adapters, invoke phase4_sdk commands, and verify deployments. Following this path leads through launcher/api/app.py, runs tests, executes the sidecar runtime, interacts with the control plane, observes ledger updates, and verifies on-chain events. This scenario underscores how repository artefacts, documentation, and code interlock to provide a reproducible journey.

End-to-End Job Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                              SDK / CLI                                       │
│                         phase4 sdk submit-job                                │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          LAUNCHER API                                        │
│  POST /v1/jobs ──▶ validate ──▶ adapter.prepare() ──▶ persist ──▶ enqueue   │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          JOB PROCESSOR                                       │
│  dequeue ──▶ privacy ──▶ placement ──▶ control_plane ──▶ driver.launch()    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         SIDECAR RUNTIME                                      │
│  attest ──▶ handshake ──▶ rotate_keys ──▶ execute_workload ──▶ complete     │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTROL PLANE / PAYMENTS                                  │
│  metering ──▶ settlement ──▶ ledger_update ──▶ on_chain_mint                 │
└─────────────────────────────────────────────────────────────────────────────┘
                
5.12 CLI and Script Ecosystem

Beyond the main services, the repository houses CLI tools for local demos, orchestrated end-to-end runs, staging validation, and production verification. Scripts under tools/ and scripts/ manage migrations, benchmarking, and provider onboarding. This tooling ensures that engineers can reproduce complex scenarios with a single command.

5.13 Documentation Corpus

Documentation spans Markdown, HTML, and PDF. Architecture diagrams, UI flows, security controls, and domain-specific analysis files demonstrate analytical depth. Many Markdown files follow naming patterns signifying milestones. This corpus bridges the gap between code and operational proof.

Component Key Path Responsibility
Launcher API launcher/api/ FastAPI routes, job submission, attestation endpoints
Privacy Gate launcher/privacy/ Attesters, key brokers, revocation, proofs
Adapters launcher/adapters/ Training, inference, quantization, rendering, composite
Control Plane control_plane/ Oracle, scheduler, metering, governance
Payments payments/ Ledger DAO, Stripe webhooks, ACU wallet
Contracts contracts/ ConversionRouter.sol, ABIs, Foundry config
Sidecar sidecar/ Runtime, attestation, rotation, TLS
SDK phase4_sdk/ ControlPlaneClient, Typer CLI
SECTION 06

Launcher Orchestration

API Surface, Service Logic, Worker Lifecycle, Placement

FastAPI wiring, job submission flows, worker processing, placement strategies, and the complete lifecycle from request to execution.

+
SECTION 06

Launcher Orchestration

ENABLING INFRASTRUCTURE · API Surface, Service Logic, Worker Lifecycle, Placement

create_app constructs a FastAPI instance titled "VR-ACU Launcher". It loads LauncherSettings, invokes configure_observability to register logging and tracing middleware, instantiates SlowAPI's Limiter with settings.security.rate_limit, attaches rate-limit exception handlers, and registers startup/shutdown callbacks that start and stop a RevocationWatcher when privacy revocation is enabled.

During application creation the launcher builds privacy components via build_privacy_components(settings.privacy). Successful initialisation yields a PrivacyGate optionally coupled with RevocationRegistry and SessionInvalidationPipeline. Failures raise PrivacyInitializationError when confidential compute is required, otherwise the app logs a warning.

LauncherService initialises adapters using load_adapter, applies adapter-specific options from configuration, constructs BasicProfiler, DefaultPolicy, QuotaEnforcer, and PolicyEngine, and ensures database schema exists. It also stores adapter privacy templates to enrich job specs with expected attestation requirements.

FastAPI Application Wiring

create_app()
    │
    ├──▶ load_settings() ──▶ LauncherSettings
    │
    ├──▶ configure_observability()
    │       ├── logging formatters
    │       ├── OTLP exporters
    │       └── Prometheus metrics
    │
    ├──▶ SlowAPI Limiter(settings.security.rate_limit)
    │
    ├──▶ build_privacy_components(settings.privacy)
    │       ├── PrivacyGate
    │       ├── RevocationRegistry
    │       └── SessionInvalidationPipeline
    │
    ├──▶ LauncherService
    │       ├── load_adapter() for each adapter
    │       ├── BasicProfiler
    │       ├── DefaultPolicy
    │       ├── QuotaEnforcer
    │       └── PolicyEngine
    │
    ├──▶ Register routers
    │       ├── /v1/jobs
    │       ├── /v1/jobs/{job_id}/attestation
    │       ├── /v1/jobs/{job_id}/dek
    │       ├── /v1/jobs/{job_id}/rotation
    │       └── /v1/artifacts/verify
    │
    └──▶ Startup/Shutdown callbacks
            ├── RevocationWatcher.start()
            └── RevocationWatcher.stop()
                
6.1 Job Submission Flow

POST /v1/jobs parses JobSubmitRequest, logs payloads, and calls service.submit_job. The service evaluates policy (PolicyEngine.evaluate), resolves adapter, prepares ResourceProfile and ExecutionPlan, profiles the workload, computes policy constraints, shapes offers, validates quotas (QuotaEnforcer.validate_submission), and persists a Job model with status PENDING.

Job Submission Endpoint
POST /v1/preflight shares the same pipeline but skips persistence, returning predicted profile and plan so clients can preview resource usage. The API response includes job ID, status, policy, offers, adapter name, and privacy configuration.
6.2 Queue Integration and Background Workers

After submission the API queues background work using enqueue_background_job(job_id) backed by Redis/RQ. Worker processes launched via launcher/worker/main.py consume these jobs. They instantiate JobProcessor, configure drivers, telemetry aggregators, placement planners, control plane clients, and privacy gate references.

Job Lifecycle State Machine

                              ┌─────────────────────────────────────────────┐
                              │                                             │
                              ▼                                             │
┌─────────┐    ┌──────────┐    ┌───────────┐    ┌──────────┐    ┌─────────┐│
│ PENDING │───▶│PROFILING │───▶│ALLOCATING │───▶│LAUNCHING │───▶│ RUNNING ││
└─────────┘    └──────────┘    └───────────┘    └──────────┘    └────┬────┘│
     │              │               │                │                │     │
     │              │               │                │                │     │
     ▼              ▼               ▼                ▼                ▼     │
┌─────────┐    ┌─────────┐    ┌─────────┐      ┌─────────┐      ┌─────────┐│
│ ABORTED │    │ FAILED  │    │ ABORTED │      │ FAILED  │      │COMPLETED││
└─────────┘    └─────────┘    └─────────┘      └─────────┘      └─────────┘│
                                                                     │      │
                                                                     └──────┘
                                                                   (revocation)

State Transitions:
  PENDING ──▶ PROFILING    : worker picks up job
  PROFILING ──▶ ALLOCATING : profile computed successfully
  ALLOCATING ──▶ LAUNCHING : control plane reservation acquired
  LAUNCHING ──▶ RUNNING    : driver.launch() succeeded
  RUNNING ──▶ COMPLETED    : workload finished successfully
  RUNNING ──▶ ABORTED      : revocation triggered
  * ──▶ FAILED             : exception during processing
  * ──▶ ABORTED            : reservation failed / policy denied
                
6.3 JobProcessor Lifecycle

JobProcessor.process(job_id) retrieves the job, extracts profile and plan, compiles strategies, optionally applies placement, coordinates privacy, materialises artifacts, and launches drivers. It publishes events to EventSink, updates job statuses throughout (PROFILING, ALLOCATING, LAUNCHING, RUNNING), interacts with control-plane reservations, and records metrics.

API Endpoints

Job Routes

Core job management endpoints for submission, inspection, and lifecycle control.

POST /v1/jobs
GET /v1/jobs
GET /v1/jobs/{job_id}
GET /v1/jobs/{job_id}/status
GET /v1/jobs/{job_id}/logs
DELETE /v1/jobs/{job_id}
Privacy Routes

Attestation

Endpoints for privacy handshake, key rotation, and session management.

POST /v1/jobs/{id}/attestation
POST /v1/jobs/{id}/dek
POST /v1/jobs/{id}/rotation
GET /v1/jobs/{id}/privacy
Artifact Routes

Verification

Merkle proof verification and artifact streaming endpoints.

POST /v1/artifacts/verify
GET /v1/jobs/{id}/artifacts
GET /v1/jobs/{id}/artifacts/{aid}
6.4 Placement, Strategy, and Rank Attestation

JobProcessor integrates MultiProviderPlacementPlanner when multi-provider jobs are enabled. Strategy compilation uses adapter-specific logic to determine rank layouts. launcher/placement/rank_attestation.py commits environment variables and annotations that providers later echo, enabling the launcher to verify strategies were executed as planned.

JobProcessor Core Logic
ASYNC FUNCTION process(job_id):
    WITH db_session() AS session:
        job = session.get(Job, job_id)
        spec = dict(job.spec or {})
        
        # Extract profile and plan from spec
        profile = ResourceProfile(**spec["profile"])
        plan = ExecutionPlan(**spec["plan"])
        
        # Compile strategy if multi-provider enabled
        IF settings.features.enable_multi_provider_jobs:
            compiled = strategy.compile(profile, plan)
            placement_map = placement_planner.place(provider.capabilities, compiled)
            spec["placement"] = serialise(placement_map)
        
        # Update status and process privacy
        record_status(job, JobStatus.PROFILING)
        record_status(job, JobStatus.ALLOCATING)
        
        # Control plane reservation
        IF control_plane_client:
            reservation = AWAIT reserve_capacity(job_id, profile)
            IF NOT reservation:
                abort(job, reason="reservation_failed")
                RETURN
        
        # Privacy handshake
        abort_reason = AWAIT process_privacy(session, job, spec)
        IF abort_reason:
            RETURN
        
        # Materialize artifacts and launch
        AWAIT materialize_artifacts(job_id, plan)
        launch_result = AWAIT driver.launch(job_id, LaunchSpec(profile, plan))
        
        record_status(job, launch_result.status)
        
    RETURN {"job_id": job_id, "status": launch_result.status}
                
"Launcher orchestration stitches together API inputs, adapter preparation, control-plane negotiation, privacy enforcement, placement, driver execution, and telemetry."
— Sidecar Runtime
SECTION 07

Sidecar Runtime

Handshake, Rotation, TLS, Telemetry

Provider-side execution runtime handling attestation handshakes, DEK rotation, TLS certificate management, and secure workload execution.

+
SECTION 07

Sidecar Runtime

ENABLING INFRASTRUCTURE · Handshake, Rotation, TLS, Telemetry

The sidecar runtime executes on provider hosts alongside GPU workloads. It orchestrates attestation, key rotation, TLS handling, artifact downloads, and workload execution. It interacts with the launcher via HTTP endpoints (/attestation, /dek, /rotation) and uses configuration from SidecarSettings. Logging is handled through structured context in provider logs.

SidecarSettings supplies job ID, provider ID, launcher URLs, bearer token, attestation challenge, rotation deadlines, proof URI, TLS requirements, artifact path, step signal path, and optional command overrides. The runtime stores settings, initialises AttestationProvider, builds a LauncherClient with HTTPX, and prepares asynchronous tasks (rotation loop, step watcher).

Sidecar Control Flow

SidecarRuntime.run()
    │
    ├──▶ _perform_handshake()
    │       │
    │       ├──▶ AttestationProvider.produce(challenge)
    │       │       ├── NVIDIA CC-On evidence
    │       │       ├── Intel TDX quote
    │       │       └── AMD SEV-SNP report
    │       │
    │       ├──▶ LauncherClient.attest(evidence)
    │       │       └── POST /v1/jobs/{job_id}/attestation
    │       │
    │       ├──▶ _write_certificate_bundle(tls_certs)
    │       │       ├── ca.pem
    │       │       ├── cert.pem
    │       │       └── key.pem
    │       │
    │       ├──▶ _write_dek_file(dek_bytes)
    │       │
    │       └──▶ _determine_command()
    │
    ├──▶ Start rotation loop (if rotation_due_at provided)
    │       └── asyncio.create_task(_rotation_loop())
    │
    ├──▶ Start step watcher loop (if step_signal_path configured)
    │       └── asyncio.create_task(_step_watcher_loop())
    │
    ├──▶ _execute(command)
    │       ├── asyncio.create_subprocess_exec()
    │       ├── stream stdout/stderr
    │       └── monitor return code
    │
    └──▶ stop()
            ├── cancel rotation task
            ├── cancel step watcher task
            ├── _zeroize_dek_file()
            └── cleanup resources
                
7.1 Attestation Provider

AttestationProvider.produce(challenge) assembles evidence for configured attesters (NVIDIA CC-On, TDX, SEV-SNP). It may gather SPDM certificate chains, GPU reports, quote structures, and certificates depending on hardware. Errors raise RuntimeError, preventing handshake from proceeding with stale evidence.

7.2 Handshake Sequence

The runtime acquires an async lock to ensure single handshake execution. It collects evidence, posts attestation, verifies responses include rotation_secret, attestation_hash, optional TLS bundle, and optional workload overrides. It writes TLS certificates, persists rotation secret, stores attestation hash, calculates rotation due timestamps, and optionally downloads artifacts.

Rotation Loop Implementation
ASYNC FUNCTION _rotation_loop():
    WHILE NOT stop_event.is_set():
        due_at = session.rotation_due_epoch
        IF NOT due_at:
            AWAIT sleep(default_interval)
            CONTINUE
        
        # Calculate sleep duration with grace period
        now = current_time()
        delay = max(0, due_at - now - settings.rotation_grace_seconds)
        AWAIT sleep(delay)
        
        TRY:
            AWAIT _rotate_key(reason="timer")
        EXCEPT HTTPError AS exc:
            logger.warning("rotation failed", error=str(exc))
            AWAIT sleep(settings.rotation_retry_delay)
        ELSE:
            logger.info("rotation completed", step=session.last_step)

ASYNC FUNCTION _rotate_key(reason):
    # Increment step counter
    step = session.last_step + 1
    
    # Build request payload
    payload = {
        "session_token": session.token,
        "step": step,
        "proof_uri": settings.proof_uri,
        "attestation_hash": session.attestation_hash
    }
    
    # Request new DEK from launcher
    response = AWAIT launcher_client.request_dek(payload)
    
    # Decode and write new key
    dek_bytes = base64_decode(response["dek_b64"])
    _write_dek_file(dek_bytes)
    
    # Update session state
    session.last_step = step
    session.rotation_secret = response.get("rotation_secret")
    session.rotation_due_at = response.get("due_at")
    
    # Acknowledge rotation
    AWAIT launcher_client.acknowledge_rotation(step)
                

Handshake Flow

Sidecar              Launcher
   │                    │
   │──▶ POST /attest ──▶│
   │    {evidence}      │
   │                    │
   │◀── response ◀──────│
   │    {dek, certs,    │
   │     rotation_due}  │
   │                    │
   │──▶ write certs ────│
   │──▶ write dek ──────│
   │                    │
                    

Rotation Flow

Sidecar              Launcher
   │                    │
   │──▶ POST /dek ─────▶│
   │    {step, token,   │
   │     proof_uri}     │
   │                    │
   │◀── response ◀──────│
   │    {new_dek,       │
   │     new_due_at}    │
   │                    │
   │──▶ POST /rotation ▶│
   │    {ack, step}     │
   │                    │
                    
7.3 Certificate and TLS Handling

_write_certificate_bundle writes CA, certificate, and key files to the artifact directory with secure permissions. _build_environment adds TLS_CA_PATH, TLS_CERT_PATH, TLS_KEY_PATH. When fingerprint validation is enabled, _load_certificate_fingerprint computes SHA-256 to send with rotation acknowledgements.

7.4 DEK Lifecycle and Zeroization

The runtime stores DEKs in a designated file, used by the workload to decrypt artifacts. After workloads finish or runtime shuts down, _zeroize_dek_file overwrites the file with zeros, flushes, closes, and unlinks it. This ensures no residual secrets remain on disk.

Security Guarantee
The sidecar runtime enforces confidentiality by tying workload execution to attested hardware, controlled key rotation, TLS management, and careful cleanup. Its implementation complements the launcher's privacy gate, ensuring providers participating in the compute exchange honour session lifetimes and cryptographic constraints.
7.5 · Full Handshake/Rotation Sequence

Complete Sidecar Lifecycle

  Provider Host          Sidecar Runtime           Launcher API            Privacy Gate
       │                      │                        │                       │
       │──start container────▶│                        │                       │
       │                      │                        │                       │
       │                      │══ HANDSHAKE PHASE ═════│═══════════════════════│
       │                      │                        │                       │
       │                      │──produce_evidence()    │                       │
       │                      │  ├─ NVIDIA CC-On      │                       │
       │                      │  ├─ Intel TDX quote   │                       │
       │                      │  └─ AMD SNP report    │                       │
       │                      │                        │                       │
       │                      │──POST /attestation────▶│                       │
       │                      │   {evidences, job_id} │──authorize_job()─────▶│
       │                      │                        │◀──PrivacyAuth────────│
       │                      │◀──200 {dek, certs, ───│                       │
       │                      │       rotation_due}    │                       │
       │                      │                        │                       │
       │                      │──write_certs()         │                       │
       │                      │──write_dek()           │                       │
       │                      │                        │                       │
       │                      │══ EXECUTION PHASE ═════│═══════════════════════│
       │                      │                        │                       │
       │                      │──spawn_workload()      │                       │
       │                      │   ├─ env: TLS_*, DEK  │                       │
       │◀─────GPU compute─────│   └─ subprocess       │                       │
       │                      │                        │                       │
       │                      │══ ROTATION LOOP ═══════│═══════════════════════│
       │                      │                        │                       │
       │                      │──[sleep until due_at]  │                       │
       │                      │──POST /rotation───────▶│                       │
       │                      │   {step, token, proof} │──issue_session_dek()─▶│
       │                      │                        │◀──new_dek─────────────│
       │                      │◀──200 {dek, next_due}──│                       │
       │                      │──write_new_dek()       │                       │
       │                      │──zeroize_old_dek()     │                       │
       │                      │   [repeat...]          │                       │
       │                      │                        │                       │
       │                      │══ SHUTDOWN PHASE ══════│═══════════════════════│
       │                      │                        │                       │
       │──workload complete──▶│                        │                       │
       │                      │──cancel_tasks()        │                       │
       │                      │──zeroize_all_keys()    │                       │
       │                      │──cleanup()             │                       │
       │◀──exit 0─────────────│                        │                       │
                
7.6 · TLS Certificate Lifecycle

Certificate Bundle Structure

/artifacts/tls/
├── ca.pem        ← Platform CA
│   └── Issuer: Platform Root
│   └── Subject: Platform CA
│   └── Validity: 10 years
│
├── cert.pem      ← Job Certificate
│   └── Issuer: Platform CA
│   └── Subject: job-{job_id}
│   └── Validity: job duration
│   └── Extensions:
│       └── subjectAltName:
│           └── DNS:*.job.internal
│
└── key.pem       ← Private Key
    └── Algorithm: ECDSA P-256
    └── Permissions: 0600
    └── Usage: TLS client auth
                    

Certificate Flow

┌─────────────────────────────┐
│       LAUNCHER API          │
│  ┌───────────────────────┐  │
│  │ generate_job_cert()   │  │
│  │  ├─ load platform CA  │  │
│  │  ├─ create CSR        │  │
│  │  ├─ sign with CA key  │  │
│  │  └─ bundle response   │  │
│  └───────────────────────┘  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│      SIDECAR RUNTIME        │
│  ┌───────────────────────┐  │
│  │ _write_cert_bundle()  │  │
│  │  ├─ mkdir -p /tls     │  │
│  │  ├─ write ca.pem      │  │
│  │  ├─ write cert.pem    │  │
│  │  └─ write key.pem     │  │
│  │      chmod 0600       │  │
│  └───────────────────────┘  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│        WORKLOAD             │
│  ┌───────────────────────┐  │
│  │ TLS_CA_PATH=/tls/ca   │  │
│  │ TLS_CERT_PATH=/tls/.. │  │
│  │ TLS_KEY_PATH=/tls/..  │  │
│  │                       │  │
│  │ mTLS connections to   │  │
│  │ other ranks / storage │  │
│  └───────────────────────┘  │
└─────────────────────────────┘
                    
7.7 · DEK Rotation State Machine

Data Encryption Key Lifecycle

                                    ┌─────────────────────────────────────────────────────────┐
                                    │                    DEK STATE MACHINE                     │
                                    └─────────────────────────────────────────────────────────┘

       ┌──────────────┐                                                       ┌──────────────┐
       │   INITIAL    │                                                       │   ZEROIZED   │
       │   (no DEK)   │                                                       │  (cleaned)   │
       └──────┬───────┘                                                       └──────▲───────┘
              │                                                                      │
              │ handshake_complete                                      shutdown OR error
              │                                                                      │
              ▼                                                                      │
       ┌──────────────┐         rotation_due          ┌──────────────┐              │
       │    ACTIVE    │──────────────────────────────▶│   ROTATING   │              │
       │   step = 0   │                               │  step = N+1  │              │
       │  dek = k₀    │◀──────────────────────────────│  new_dek     │              │
       └──────┬───────┘         rotation_ack          └──────┬───────┘              │
              │                                              │                      │
              │                                              │                      │
              │                    ┌─────────────────────────┘                      │
              │                    │                                                │
              │                    ▼                                                │
              │             ┌──────────────┐                                        │
              │             │   ACTIVE     │                                        │
              │             │  step = N+1  │                                        │
              │             │  dek = k_{N+1}│───────────────────────────────────────┘
              │             └──────────────┘
              │                    │
              │                    │ rotation_due (repeat)
              │                    ▼
              │             ┌ ─ ─ ─ ─ ─ ─ ┐
              └────────────▶   ROTATING    ─────▶ ...
                            └ ─ ─ ─ ─ ─ ─ ┘

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  State Transitions:                                                                          │
  │    INITIAL → ACTIVE:      handshake returns first DEK (step=0)                               │
  │    ACTIVE → ROTATING:     rotation_due_at reached, request new DEK                           │
  │    ROTATING → ACTIVE:     new DEK received and written, step incremented                     │
  │    ACTIVE → ZEROIZED:     shutdown signal OR workload complete OR error                      │
  │                                                                                              │
  │  Invariants:                                                                                 │
  │    - Only one DEK active at a time                                                           │
  │    - Steps monotonically increase                                                            │
  │    - Old DEK overwritten before new DEK written (atomic swap)                                │
  │    - Zeroization always occurs on exit                                                       │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘
                
7.8 · Attestation Evidence Chain

Hardware → Evidence → Verification

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                              ATTESTATION EVIDENCE CHAIN                                      │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  HARDWARE LAYER                                                                              │
  │  ┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐                      │
  │  │    NVIDIA GPU     │   │   Intel CPU       │   │    AMD CPU        │                      │
  │  │  ┌─────────────┐  │   │  ┌─────────────┐  │   │  ┌─────────────┐  │                      │
  │  │  │ CC-On Mode  │  │   │  │  TDX Module │  │   │  │ SEV-SNP PSP │  │                      │
  │  │  │ SPDM Engine │  │   │  │  TD-VMCALL  │  │   │  │ Guest Req   │  │                      │
  │  │  └─────────────┘  │   │  └─────────────┘  │   │  └─────────────┘  │                      │
  │  └─────────┬─────────┘   └─────────┬─────────┘   └─────────┬─────────┘                      │
  └────────────┼─────────────────────────┼─────────────────────────┼────────────────────────────┘
               │                         │                         │
               ▼                         ▼                         ▼
  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  EVIDENCE PRODUCTION                                                                         │
  │  ┌───────────────────┐   ┌───────────────────┐   ┌───────────────────┐                      │
  │  │ NVIDIA Evidence   │   │   TDX Quote       │   │   SNP Report      │                      │
  │  │ ├─ SPDM Cert Chain│   │ ├─ TD Report      │   │ ├─ Guest Report   │                      │
  │  │ ├─ GPU Report     │   │ ├─ Signature      │   │ ├─ Signature      │                      │
  │  │ ├─ Measurements   │   │ ├─ Cert Chain     │   │ ├─ VCEK Cert      │                      │
  │  │ └─ Nonce Binding  │   │ └─ Challenge Hash │   │ └─ Challenge Hash │                      │
  │  └───────────────────┘   └───────────────────┘   └───────────────────┘                      │
  └────────────┬─────────────────────────┬─────────────────────────┬────────────────────────────┘
               │                         │                         │
               └─────────────────────────┼─────────────────────────┘
                                         │
                                         ▼
  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  VERIFICATION LAYER (Launcher Attesters)                                                     │
  │                                                                                              │
  │  ┌───────────────────────────────────────────────────────────────────────────────────────┐  │
  │  │  NvidiaCcOnAttester.verify()                                                           │  │
  │  │    ├─ Validate SPDM certificate chain against NVIDIA root                              │  │
  │  │    ├─ Verify GPU report signature                                                      │  │
  │  │    ├─ Check measurements against known-good baseline                                   │  │
  │  │    ├─ Validate nonce matches challenge (job_id binding)                                │  │
  │  │    └─ Extract claims: gpu_model, driver_version, cc_mode                               │  │
  │  └───────────────────────────────────────────────────────────────────────────────────────┘  │
  │  ┌───────────────────────────────────────────────────────────────────────────────────────┐  │
  │  │  TdxAttester.verify()                                                                  │  │
  │  │    ├─ Verify quote signature against Intel attestation service                         │  │
  │  │    ├─ Validate TD report body                                                          │  │
  │  │    ├─ Check MRTD/MRCONFIGID measurements                                               │  │
  │  │    └─ Extract claims: mrenclave, mrsigner, tcb_level                                   │  │
  │  └───────────────────────────────────────────────────────────────────────────────────────┘  │
  │  ┌───────────────────────────────────────────────────────────────────────────────────────┐  │
  │  │  SnpAttester.verify()                                                                  │  │
  │  │    ├─ Validate VCEK certificate chain against AMD root                                 │  │
  │  │    ├─ Verify attestation report signature                                              │  │
  │  │    ├─ Check launch measurement                                                         │  │
  │  │    └─ Extract claims: guest_svn, policy, platform_info                                 │  │
  │  └───────────────────────────────────────────────────────────────────────────────────────┘  │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘
                
SECTION 08

Control Plane

Oracle Pricing, DRF Scheduling, Governance

Economic core with price index oracles, dominant resource fairness scheduling, metering slices, and governance timelocks.

+
SECTION 08

Control Plane Services

ENABLING INFRASTRUCTURE · Context, Scheduler, Oracle, Governance

ControlPlaneContext.from_config constructs services based on ServiceConfig. It initialises database connections, price index oracle, scheduler, metering service, settlement router, optional Arbitrum contracts, resilience guards, regional coordinator, health monitor, governance, key transparency, enterprise services, and fleet optimiser. Each component references its respective module.

ControlPlaneHTTPServer embeds context, API key, and metrics collector. RequestHandler enforces X-API-Key for all routes except /health. GET serves health data, metrics, governance proposals, key transparency roots, provider lists, policy snapshots, ledger pools, resilience status, regional status, fleet join tokens, and dashboards. POST handles demand configuration, supply submissions, bucket finalisation, reservations, metering acknowledgements, and governance proposals.

Control Plane Service Architecture

ControlPlaneContext.from_config(ServiceConfig)
    │
    ├──▶ Database connections (Postgres / SQLite)
    │
    ├──▶ PriceIndexOracleService
    │       ├── record_supply_offers()
    │       ├── configure_demand()
    │       └── calculate_clearing_price()
    │
    ├──▶ VRACUScheduler
    │       ├── Dominant Resource Fairness
    │       ├── Attained-service scoring
    │       └── MIG slice support
    │
    ├──▶ MeteringService
    │       ├── record_slice()
    │       ├── SHA-256 idempotency
    │       └── duplicate detection
    │
    ├──▶ SettlementRouter
    │       ├── SCM-weighted TWAP
    │       ├── Ceiling rounding to micro-ACU
    │       └── Hold fraction for disputes
    │
    ├──▶ ArbitrumContracts (optional)
    │       ├── ConversionRouter interface
    │       └── Web3 transaction signing
    │
    ├──▶ ResilienceGuards
    │       ├── Outstanding exposure limits
    │       ├── Price floor enforcement
    │       └── Credit utilisation monitoring
    │
    ├──▶ RegionalCoordinator
    │       ├── Heartbeat timeout handling
    │       ├── Provisional receipts
    │       └── Replay mode support
    │
    ├──▶ GovernanceService
    │       ├── Proposal lifecycle
    │       ├── Timelock enforcement
    │       └── Multi-role approvals
    │
    └──▶ KeyTransparencyLog
            ├── Merkle tree per week
            ├── Inclusion proofs
            └── Provider key verification
                
8.1 Price Index Oracle

PriceIndexOracleService records supply offers keyed by demand buckets. Demand configuration defines windows, reserve requirements, and price bounds. The oracle calculates clearing prices, returning BucketResult objects with allocations. Surge multipliers apply when utilization exceeds 95%, scaling linearly from 1.0x at 95% to 1.5x at 100%.

8.2 Scheduler

VRACUScheduler implements Dominant Resource Fairness with attained-service scoring. Each provider maintains a running total of attained service minutes; new allocations favor providers with lower historical utilization. NVIDIA MIG support extends the scheduler's capacity model, treating each MIG slice as an independent scheduling unit.

Metering

MeteringService

Provides strict idempotency via SHA-256 content hashing. Each slice contains job ID, bucket ID, sequence number, SCM delta, and price index.

Idempotency: SHA-256 hash
Duplicates: succeed silently
Conflicts: rejection with reason
Settlement

SettlementRouter

Computes SCM-weighted TWAP across slices. Burn amounts use ceiling rounding to micro-ACU units.

Formula: Σ(minutes × price) / Σ(minutes)
Rounding: ceiling to micro-ACU
Hold fraction: 0.0-1.0 for disputes
Signatures

DualSignatureService

Produces cryptographic attestations with Ed25519 primary and post-quantum secondary signatures.

Primary: Ed25519 (32-byte seed)
Secondary: SHAKE-256 → Dilithium3
Covers: JCS-canonicalized JSON

Reservation Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                          RESERVATION REQUEST                                 │
│  {job_id, tenant_id, gpu_profile, scm_minutes, policy_metadata}             │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         RESILIENCE GUARDS                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  IF outstanding_exposure > max_exposure:                             │    │
│  │      RETURN ReservationDeclined(reason="resilience_guard")           │    │
│  │  IF price < price_floor:                                             │    │
│  │      RETURN ReservationDeclined(reason="price_floor")                │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           SCHEDULER                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  allocation = scheduler.allocate(request)                            │    │
│  │  IF allocation IS None:                                              │    │
│  │      RETURN ReservationDeclined(reason="insufficient_capacity")      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          PERSISTENCE                                         │
│  database.save_reservation(allocation)                                       │
│  metrics.record_reservation(allocation)                                      │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     RESERVATION RESPONSE                                     │
│  ReservationAccepted(reservation_id, provider_id, expiry)                    │
└─────────────────────────────────────────────────────────────────────────────┘
                
"In distributed systems, the most elegant solution is often the one that admits its own limitations and builds safeguards around them."
— Control Plane
8.3 · Oracle Pricing Pipeline

Price Discovery & Clearing

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                PRICE INDEX ORACLE PIPELINE                                   │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  SUPPLY SIDE                                                                               │
  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
  │  │ Provider A  │  │ Provider B  │  │ Provider C  │  │ Provider D  │  │ Provider E  │      │
  │  │ H100 x 8    │  │ A100 x 4    │  │ H100 x 2    │  │ A100 x 8    │  │ H200 x 4    │      │
  │  │ $2.50/SCM   │  │ $1.80/SCM   │  │ $2.60/SCM   │  │ $1.75/SCM   │  │ $3.00/SCM   │      │
  │  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘      │
  └─────────┼────────────────┼────────────────┼────────────────┼────────────────┼──────────────┘
            │                │                │                │                │
            └────────────────┴────────────────┼────────────────┴────────────────┘
                                              │
                                              ▼
  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  ORACLE SERVICE                                                                            │
  │  ┌─────────────────────────────────────────────────────────────────────────────────────┐  │
  │  │  record_supply_offers():                                                             │  │
  │  │    - Bucket by GPU profile + region                                                  │  │
  │  │    - Sort by price ascending                                                         │  │
  │  │    - Build supply curve (cumulative capacity)                                        │  │
  │  └─────────────────────────────────────────────────────────────────────────────────────┘  │
  │                                              │                                             │
  │                                              ▼                                             │
  │  ┌─────────────────────────────────────────────────────────────────────────────────────┐  │
  │  │  calculate_clearing_price(bucket, demand_scm):                                       │  │
  │  │                                                                                      │  │
  │  │    Price                                                                             │  │
  │  │    ▲                                                                                 │  │
  │  │    │                          ╱ Supply Curve                                         │  │
  │  │    │                        ╱                                                        │  │
  │  │  P*├─────────────────────●                    ← Clearing Price                       │  │
  │  │    │                   ╱ │                                                           │  │
  │  │    │                 ╱   │                                                           │  │
  │  │    │               ╱     │                                                           │  │
  │  │    │             ╱       │                                                           │  │
  │  │    └─────────────────────┼──────────────────────▶ Quantity (SCM)                     │  │
  │  │                          Q* (demand)                                                 │  │
  │  │                                                                                      │  │
  │  │    IF utilization > 95%: apply surge_multiplier (1.0x → 1.5x)                       │  │
  │  └─────────────────────────────────────────────────────────────────────────────────────┘  │
  └───────────────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  BUCKET RESULT                                                                             │
  │  { bucket_id, clearing_price, allocations: [{provider, scm, price}], surge_applied }      │
  └───────────────────────────────────────────────────────────────────────────────────────────┘
                
8.4 · DRF Scheduler Algorithm

Dominant Resource Fairness

┌─────────────────────────────┐
│  SCHEDULER STATE            │
│  ─────────────────────────  │
│  providers: [              │
│    {id: A, attained: 120},  │
│    {id: B, attained: 80},   │
│    {id: C, attained: 200},  │
│  ]                          │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  NEW ALLOCATION REQUEST     │
│  {job_id, scm: 100}         │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  SORT BY ATTAINED SERVICE   │
│  (ascending)                │
│  ─────────────────────────  │
│  1. Provider B (80 min)     │
│  2. Provider A (120 min)    │
│  3. Provider C (200 min)    │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  SELECT FIRST WITH CAPACITY │
│  Provider B: capacity ✓     │
│  ─────────────────────────  │
│  B.attained += 100          │
│  B.attained = 180           │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  RETURN ALLOCATION          │
│  {provider: B, scm: 100}    │
└─────────────────────────────┘
                    

MIG Slice Support

┌─────────────────────────────┐
│  PHYSICAL GPU: H100 80GB    │
│  ═══════════════════════════│
│  ┌─────────────────────────┐│
│  │ MIG Instance 0          ││
│  │ Profile: 1g.10gb        ││
│  │ Status: allocated       ││
│  │ Job: job-123            ││
│  └─────────────────────────┘│
│  ┌─────────────────────────┐│
│  │ MIG Instance 1          ││
│  │ Profile: 1g.10gb        ││
│  │ Status: available       ││
│  └─────────────────────────┘│
│  ┌─────────────────────────┐│
│  │ MIG Instance 2          ││
│  │ Profile: 2g.20gb        ││
│  │ Status: available       ││
│  └─────────────────────────┘│
│  ┌─────────────────────────┐│
│  │ MIG Instance 3          ││
│  │ Profile: 4g.40gb        ││
│  │ Status: allocated       ││
│  │ Job: job-456            ││
│  └─────────────────────────┘│
└─────────────────────────────┘

Scheduler treats each MIG
instance as independent
scheduling unit with own:
  - Capacity (VRAM, SMs)
  - Attained service
  - Allocation state
                    
8.5 · Governance Timelock Sequence

Proposal → Queue → Execute

  Proposer             Governance Service          Timelock               Target Contract
      │                       │                       │                        │
      │──propose(action)─────▶│                       │                        │
      │                       │──validate_proposer()  │                        │
      │                       │──check_quorum()       │                        │
      │                       │──create_proposal()    │                        │
      │◀──proposal_id─────────│                       │                        │
      │                       │                       │                        │
      │                       │                       │                        │
  ════│═══════════════════════│═══ VOTING PERIOD ════│════════════════════════│═══
      │                       │                       │                        │
      │──vote(yes/no)────────▶│                       │                        │
      │                       │──record_vote()        │                        │
      │                       │──update_tally()       │                        │
      │                       │                       │                        │
      │                       │                       │                        │
  ════│═══════════════════════│═══ VOTING ENDS ══════│════════════════════════│═══
      │                       │                       │                        │
      │──queue()─────────────▶│                       │                        │
      │                       │──check_passed()       │                        │
      │                       │──schedule_timelock()─▶│                        │
      │                       │                       │──start_delay()         │
      │                       │                       │   (48h minimum)        │
      │                       │                       │                        │
      │                       │                       │                        │
  ════│═══════════════════════│═══ TIMELOCK DELAY ═══│════════════════════════│═══
      │                       │                       │                        │
      │──execute()───────────▶│                       │                        │
      │                       │──verify_timelock()───▶│                        │
      │                       │                       │──check_eta_passed()    │
      │                       │                       │──execute_action()─────▶│
      │                       │                       │                        │──apply()
      │                       │                       │◀──────────success──────│
      │                       │◀──execution_receipt───│                        │
      │◀──tx_hash─────────────│                       │                        │
      │                       │                       │                        │
                
SECTION 09

Payments

Stripe Integration, Ledger, ACU Wallet

Payment processing with Stripe webhooks, SQS queue workers, ledger persistence, and Arbitrum-based ACU token integration.

+
SECTION 09

Payments Infrastructure

ENABLING INFRASTRUCTURE · Stripe, SQS, Ledger, ACU Wallet

LedgerDAO encapsulates SQLAlchemy interactions for payment credits. create_entry inserts PaymentCredit rows with PENDING status. mark_minted, mark_escrowed, mark_failed, mark_refunded update statuses and append transaction hashes. reserve_credit_for_job atomically assigns minted credits to jobs, raising ValueError if no credits match criteria.

FastAPI routes verify Stripe signatures via stripe.Webhook.construct_event, normalise events, record idempotency via ledger.record_stripe_event, and enqueue messages on SQS when configured. Payloads capture trainer IDs, GPU profiles, minute counts, wallet addresses, job IDs, regions, price versions, and invoice references.

Payment Flow Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           STRIPE WEBHOOK                                     │
│  checkout.session.completed / payment_intent.succeeded                       │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        WEBHOOK ROUTER                                        │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  1. stripe.Webhook.construct_event(payload, signature, secret)       │    │
│  │  2. _normalize_event() ──▶ extract metadata                          │    │
│  │  3. ledger.record_stripe_event() ──▶ idempotency check               │    │
│  │  4. WebhookQueue.enqueue() ──▶ SQS FIFO                              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         QUEUE WORKER                                         │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  WHILE True:                                                         │    │
│  │      messages = queue.receive_messages()                             │    │
│  │      FOR message IN messages:                                        │    │
│  │          payload = StripePaymentPayload.parse(message.body)          │    │
│  │          process_payment(payload)                                    │    │
│  │          queue.delete(message)                                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         LEDGER DAO                                           │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  create_entry(trainer_id, amount, wallet) ──▶ PENDING                │    │
│  │  mark_minted(payment_id, tx_hash) ──▶ MINTED                         │    │
│  │  reserve_credit_for_job(job_id) ──▶ assign credit                    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        ACU WALLET                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  tx = wallet.build_mint(recipient, acu_amount, avl_burn)             │    │
│  │  tx_hash = wallet.send(tx) ──▶ Arbitrum RPC                          │    │
│  │  ledger.mark_minted(payment_id, tx_hash)                             │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                
9.1 SQS Queue and Worker

WebhookQueue wraps boto3 SQS interactions, supporting FIFO dedupe identifiers. QueueWorker polls messages, deserialises JSON into StripePaymentPayload, invokes payment processing logic, and deletes successfully processed messages. The worker logs failures and re-queues messages by leaving them in SQS when exceptions occur.

Settlement Calculation
# Aggregate metering slices
total_minutes = sum(slice.minutes for slice in slices)
numerator = sum(slice.minutes * slice.price for slice in slices)

# Compute TWAP (Time-Weighted Average Price)
twap_micro_usd = numerator // total_minutes

# Compute burn with ceiling rounding
burn_micro_acu = ceil(numerator / mint_price_micro_usd)

# Apply hold fraction for dispute buffer
provider_micro_acu = int(burn_micro_acu * (1.0 - hold_fraction))
refund_micro_acu = burn_micro_acu - provider_micro_acu

# Generate canonical receipt (JCS sorted keys)
receipt = {
    "burn_micro_acu": burn_micro_acu,
    "job_id": job_id,
    "provider": provider_address,
    "provider_micro_acu": provider_micro_acu,
    "refund_micro_acu": refund_micro_acu,
    "twap_micro_usd_per_scm": twap_micro_usd
}
                
ACU Wallet Integration
The wallet router constructs ACU mint/burn transactions, interacts with blockchain RPC endpoints, and records transaction hashes in ledger entries. It ensures conversion rates align with the price oracle and that minted ACU tokens correspond to burned AVL amounts, maintaining on-chain/off-chain parity.
9.2 · Ledger Transaction Flow

Credit → Debit → Balance Reconciliation

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                                    LEDGER TRANSACTION FLOW                                   │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  CREDIT FLOW (Stripe Payment → ACU Tokens)                                                 │
  │                                                                                            │
  │   Stripe              Webhook              LedgerDAO               ACU Wallet             │
  │     │                   │                     │                       │                   │
  │     │──payment.success─▶│                     │                       │                   │
  │     │                   │──create_entry()────▶│                       │                   │
  │     │                   │                     │  status: PENDING      │                   │
  │     │                   │                     │──────────────────────▶│                   │
  │     │                   │                     │                       │──mint_acu()       │
  │     │                   │                     │◀──tx_hash─────────────│                   │
  │     │                   │                     │  status: MINTED       │                   │
  │     │                   │                     │                       │                   │
  └───────────────────────────────────────────────────────────────────────────────────────────┘

  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  DEBIT FLOW (Job Execution → Provider Payout)                                              │
  │                                                                                            │
  │   Job Complete          MeteringService          LedgerDAO           Provider Wallet      │
  │     │                        │                      │                      │              │
  │     │──record_slice()───────▶│                      │                      │              │
  │     │                        │──reserve_credit()───▶│                      │              │
  │     │                        │                      │  status: ESCROWED    │              │
  │     │                        │                      │                      │              │
  │     │                        │                      │                      │              │
  │   Settlement             SettlementRouter          │                      │              │
  │     │                        │                      │                      │              │
  │     │──finalize()───────────▶│                      │                      │              │
  │     │                        │──compute_payout()    │                      │              │
  │     │                        │──mark_settled()─────▶│                      │              │
  │     │                        │                      │  status: SETTLED     │              │
  │     │                        │                      │──transfer_acu()─────▶│              │
  │     │                        │                      │                      │──received    │
  │     │                        │                      │                      │              │
  └───────────────────────────────────────────────────────────────────────────────────────────┘

  ┌───────────────────────────────────────────────────────────────────────────────────────────┐
  │  LEDGER STATES                                                                             │
  │                                                                                            │
  │   PENDING ───▶ MINTED ───▶ ESCROWED ───▶ SETTLED                                          │
  │      │           │            │             │                                              │
  │      │           │            │             └──▶ Provider receives ACU                     │
  │      │           │            └──▶ Reserved for job, locked                                │
  │      │           └──▶ On-chain ACU minted, available                                       │
  │      └──▶ Awaiting blockchain confirmation                                                 │
  │                                                                                            │
  │   Alternative paths:                                                                       │
  │   PENDING ───▶ FAILED (blockchain error)                                                   │
  │   ESCROWED ───▶ REFUNDED (job cancelled, dispute won)                                      │
  └───────────────────────────────────────────────────────────────────────────────────────────┘
                
9.3 · SQS Queue Worker Pipeline

FIFO Queue Architecture

┌─────────────────────────────┐
│      STRIPE WEBHOOKS        │
│  ┌───────────────────────┐  │
│  │ checkout.completed    │  │
│  │ payment.succeeded     │  │
│  │ refund.created        │  │
│  └───────────┬───────────┘  │
└──────────────┼──────────────┘
               │
               ▼
┌─────────────────────────────┐
│   SQS FIFO QUEUE            │
│   payments-prod.fifo        │
│  ═══════════════════════════│
│  ┌─────────────────────────┐│
│  │ Message 1 (oldest)     ││
│  │ dedupe: evt_abc123     ││
│  │ group: tenant_001      ││
│  └─────────────────────────┘│
│  ┌─────────────────────────┐│
│  │ Message 2              ││
│  │ dedupe: evt_def456     ││
│  │ group: tenant_002      ││
│  └─────────────────────────┘│
│  ┌─────────────────────────┐│
│  │ Message 3 (newest)     ││
│  │ dedupe: evt_ghi789     ││
│  │ group: tenant_001      ││
│  └─────────────────────────┘│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│      QUEUE WORKER           │
│  (ECS / Lambda)             │
└─────────────────────────────┘
                    

Worker Processing Loop

┌─────────────────────────────┐
│     QUEUE WORKER LOOP       │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  receive_messages(          │
│    max_messages=10,         │
│    wait_time=20s            │
│  )                          │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  FOR message IN messages:   │
│  ┌─────────────────────────┐│
│  │ payload = parse(body)   ││
│  │ validate_signature()    ││
│  │ idempotency_check()     ││
│  └─────────────────────────┘│
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  TRY:                       │
│    process_payment(payload) │
│    ledger.create_entry()    │
│    wallet.mint_acu()        │
│    queue.delete(message)    │
│  EXCEPT:                    │
│    log_error()              │
│    # message stays in queue │
│    # retry after visibility │
│    # timeout (30s default)  │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│  CONTINUE (next iteration)  │
└─────────────────────────────┘
                    
9.4 · Stripe Webhook Processing

Signature Verification & Event Handling

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │                               STRIPE WEBHOOK PROCESSING                                      │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘

  Stripe                    FastAPI Router                   Handlers                 Queue
    │                            │                              │                       │
    │──POST /webhooks/stripe────▶│                              │                       │
    │   Headers:                 │                              │                       │
    │     Stripe-Signature       │                              │                       │
    │   Body: {event}            │                              │                       │
    │                            │                              │                       │
    │                            │──stripe.Webhook.construct_event()                    │
    │                            │   (payload, signature, secret)                       │
    │                            │                              │                       │
    │                            │──IF InvalidSignature:        │                       │
    │◀──────────400 Bad Req──────│   RETURN 400                 │                       │
    │                            │                              │                       │
    │                            │──_normalize_event()          │                       │
    │                            │   extract: trainer_id,       │                       │
    │                            │            amount,           │                       │
    │                            │            wallet_address,   │                       │
    │                            │            metadata          │                       │
    │                            │                              │                       │
    │                            │──ledger.record_stripe_event()│                       │
    │                            │   (idempotency check)        │                       │
    │                            │                              │                       │
    │                            │──IF event.type == "checkout.session.completed":     │
    │                            │     handler = handle_checkout│                       │
    │                            │──ELIF event.type == "payment_intent.succeeded":     │
    │                            │     handler = handle_payment │                       │
    │                            │──ELIF event.type == "refund.created":               │
    │                            │     handler = handle_refund  │                       │
    │                            │                              │                       │
    │                            │──handler(event)─────────────▶│                       │
    │                            │                              │──build_payload()      │
    │                            │                              │──validate_schema()    │
    │                            │                              │                       │
    │                            │◀─────────payload─────────────│                       │
    │                            │                              │                       │
    │                            │──queue.enqueue(payload)─────────────────────────────▶│
    │                            │                              │                       │
    │◀──────────200 OK───────────│                              │                       │
    │                            │                              │                       │

  ┌─────────────────────────────────────────────────────────────────────────────────────────────┐
  │  Event Types Handled:                                                                        │
  │    checkout.session.completed  →  Initial payment, mint ACU                                  │
  │    payment_intent.succeeded    →  Recurring/additional payment                               │
  │    refund.created              →  Refund request, update ledger                              │
  │    invoice.paid                →  Subscription renewal                                       │
  │    customer.subscription.*     →  Subscription lifecycle                                     │
  └─────────────────────────────────────────────────────────────────────────────────────────────┘
                
SECTION 10

Contracts

On-Chain Settlement

Solidity smart contracts including ConversionRouter for AVL-to-ACU burns, spend limits, and dual-signature governance.

+
SECTION 10

Smart Contracts

ENABLING INFRASTRUCTURE · ConversionRouter, ABIs, On-Chain Settlement

contracts/ConversionRouter.sol uses OpenZeppelin AccessControl, IERC20, and Pausable. Roles include DEFAULT_ADMIN_ROLE, OPERATOR_ROLE, and PAUSER_ROLE. Constructor stores immutable AVL token, ACU token, price oracle, and burn agent addresses, initialises spend limit, and sets role admins. The contract enforces non-zero addresses to prevent deployment mistakes.

burnAVLForACU (operator-only, when not paused) validates recipient, ACU amount, spend limit, queries AVL required via oracle, transfers AVL from operator to burn agent, calls IAVL.burn, transfers ACU from reserve to recipient, increments acuSpent, and emits ConversionExecuted. Administrative functions adjust reserve address, spend limit, and pause/unpause conversions.

On-Chain Settlement Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                        OPERATOR WALLET                                       │
│  1. approve(ConversionRouter, avlAmount)                                     │
│  2. burnAVLForACU(acuAmount, recipient)                                      │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONVERSION ROUTER                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  require(hasRole(OPERATOR_ROLE, msg.sender))                         │    │
│  │  require(recipient != address(0))                                    │    │
│  │  require(acuAmount > 0)                                              │    │
│  │  require(acuSpent + acuAmount <= acuSpendLimit)                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        PRICE ORACLE                                          │
│  avlNeeded = oracle.quoteAVLforACU(acuAmount)                                │
│  require(avlNeeded > 0)                                                      │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         TOKEN OPERATIONS                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  1. avl.transferFrom(operator, burnAgent, avlNeeded)                 │    │
│  │  2. avl.burn(burnAgent, avlNeeded)                                   │    │
│  │  3. acu.transferFrom(reserve, recipient, acuAmount)                  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         STATE UPDATE                                         │
│  acuSpent += acuAmount                                                       │
│  emit ConversionExecuted(operator, recipient, acuAmount, avlNeeded)          │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       RECIPIENT WALLET                                       │
│  ACU tokens received                                                         │
│  Event logs ──▶ Off-chain ledger reconciliation                              │
└─────────────────────────────────────────────────────────────────────────────┘
                
ConversionRouter.sol — Burn Function
function burnAVLForACU(
    uint256 acuAmount,
    address recipient
) external whenNotPaused returns (uint256) {
    require(hasRole(OPERATOR_ROLE, msg.sender), "not operator");
    require(recipient != address(0), "zero recipient");
    require(acuAmount > 0, "zero amount");
    require(acuSpent + acuAmount <= acuSpendLimit, "spend limit");

    uint256 avlNeeded = oracle.quoteAVLforACU(acuAmount);
    require(avlNeeded > 0, "zero quote");

    require(avl.transferFrom(msg.sender, burnAgent, avlNeeded));
    avl.burn(burnAgent, avlNeeded);
    require(acu.transferFrom(reserve, recipient, acuAmount));

    acuSpent += acuAmount;
    emit ConversionExecuted(msg.sender, recipient, acuAmount, avlNeeded);
    return avlNeeded;
}
                
State Variables

Contract State

Immutable references and mutable accounting state for spend tracking.

avl: IERC20 (immutable)
acu: IERC20 (immutable)
oracle: IPriceOracle (immutable)
burnAgent: address (immutable)
reserve: address (mutable)
acuSpendLimit: uint256
acuSpent: uint256
Access Control

Role Hierarchy

OpenZeppelin AccessControl with three distinct roles for operations.

DEFAULT_ADMIN_ROLE: role management
OPERATOR_ROLE: burn/mint ops
PAUSER_ROLE: pause/unpause
Role admin: DEFAULT_ADMIN_ROLE
Events

Event Emissions

Events for off-chain indexing and ledger reconciliation.

ConversionExecuted(operator,
  recipient, acuAmount, avlBurned)
SpendLimitUpdated(old, new)
ReserveUpdated(old, new)
SECTION 11

SDK

Client Libraries and CLI

Phase4 SDK with ControlPlaneClient, Typer CLI wrappers, and configuration helpers for external integration.

+
SECTION 11

SDK and Client Libraries

ENABLING INFRASTRUCTURE · ControlPlaneClient, Typer CLI, Configuration Helpers

ControlPlaneClient wraps urllib.request to interact with control plane endpoints. It stores base URL, API key, timeout, and optional SSL context. _request builds requests, attaches headers (X-API-Key, Content-Type), serialises JSON, parses responses, and raises ControlPlaneError on HTTP failures. _request_with_retry handles HTTP 429 with exponential backoff and jitter.

Methods include register_provider, configure_demand, submit_supply, reserve_capacity, allocate_reservation, finalize_bucket, wait_for_task, and get_task. Typer-based CLI exposes commands mirroring client methods, parsing command-line options, environment variables, and printing JSON responses.

SDK Interaction Surfaces

┌─────────────────────────────────────────────────────────────────────────────┐
│                      DEVELOPER CLI / SCRIPTS                                 │
│                                                                              │
│  $ phase4 sdk configure-demand --tenant team-a --gpu-profile a100x8         │
│  $ phase4 sdk submit-supply --provider prov-123 --capacity 100              │
│  $ phase4 sdk reserve-capacity --job-id job-42 --bucket-id bucket-1         │
│  $ phase4 sdk wait-for-task --task-id task-99 --timeout 300                 │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONTROL PLANE CLIENT                                    │
│                                                                              │
│  ControlPlaneClient(base_url, api_key, timeout, ssl_context)                 │
│      │                                                                       │
│      ├── _request(method, path, payload)                                     │
│      │       ├── build urllib.request.Request                                │
│      │       ├── attach X-API-Key header                                     │
│      │       ├── serialize JSON payload                                      │
│      │       └── parse JSON response                                         │
│      │                                                                       │
│      └── _request_with_retry(...)                                            │
│              ├── handle HTTP 429                                             │
│              ├── exponential backoff                                         │
│              └── jitter randomization                                        │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      CONTROL PLANE ENDPOINTS                                 │
│                                                                              │
│  /oracle/*          ← demand, supply, finalize                               │
│  /scheduler/*       ← reservations, allocations                              │
│  /metering/*        ← slices, acknowledgements                               │
│  /settlement/*      ← receipts, burns                                        │
│  /governance/*      ← proposals, execution                                   │
│  /enterprise/*      ← treasury, transfers                                    │
└─────────────────────────────────────────────────────────────────────────────┘
                
SDK Usage Example
from phase4_sdk import ControlPlaneClient

# Configure client
client = ControlPlaneClient(
    base_url="https://control-plane.vracu.net",
    api_key=os.environ["CONTROL_PLANE_API_KEY"],
    timeout=30.0
)

# Submit demand configuration
demand = client.configure_demand(
    tenant="team-vision",
    gpu_profile="a100x8",
    scm_minutes=720,
    reserve_ratio=0.15
)

# Reserve capacity for job
reservation = client.reserve_capacity(
    job_id="job-42",
    bucket_id=demand["bucket_id"],
    required_scm_minutes=60
)

# Poll for task completion
result = client.wait_for_task(
    task_id=reservation["task_id"],
    timeout=300
)

print(f"Reservation: {reservation['reservation_id']}")
print(f"Provider: {result['provider_id']}")
                
CLI Commands
The Typer app defines command groups: providers, oracle, scheduler, metering, settlement, governance, region, enterprise, and payments. Each subcommand maps to a ControlPlaneClient method, converting CLI options to JSON payloads. Commands support JSON output toggles and pretty-printing via rich. phase4 sdk --help enumerates dozens of commands.
SECTION 12

Providers

Directory API and Onboarding

Provider ecosystem including directory API, join tokens, heartbeat monitoring, MIG management, and capability attestation.

+
SECTION 12

Provider Ecosystem

ENABLING INFRASTRUCTURE · Directory API, Provider Node, MIG Manager, Observability

alien-directory-api/directory_api/app.py builds a FastAPI service exposing /v1 endpoints. Routes include tokens (issuing join tokens), providers (registering providers, fetching profiles, updating metadata), and heartbeats (recording periodic health signals). Tokens include TTLs and scopes, enabling operators to issue time-bound onboarding credentials.

alien-provider-node implements a Typer CLI that consumes join tokens, registers hardware inventory, and reports telemetry. It communicates with the directory API using HTTPX, handles retries, and caches tokens securely. The node collects GPU inventory (count, memory, MIG profiles) and publishes metrics at regular intervals.

Provider Onboarding Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                      JOIN TOKEN ISSUER                                       │
│  Control Plane / Operator ──▶ POST /fleet/join-tokens                        │
│      {policy, ttl, scope, treasury_account}                                  │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       DIRECTORY API                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  POST /v1/join-tokens ──▶ generate token with TTL                    │    │
│  │  POST /v1/join ──▶ validate token, create provider record            │    │
│  │  POST /v1/join/verify ──▶ verify signature, store public key         │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       PROVIDER NODE                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  vracu-provider init --join-token                             │    │
│  │      ├── redeem token                                                │    │
│  │      ├── verify signature                                            │    │
│  │      └── store provider ID                                           │    │
│  │                                                                      │    │
│  │  vracu-provider configure-mig --profile 1g.5gb --count 7             │    │
│  │      └── MIGManager.configure() ──▶ nvidia-smi                       │    │
│  │                                                                      │    │
│  │  vracu-provider run                                                  │    │
│  │      ├── heartbeat loop ──▶ POST /providers/{id}/heartbeat           │    │
│  │      ├── telemetry loop ──▶ GPU metrics, temperature                 │    │
│  │      └── attestation loop ──▶ refresh credentials                    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    OBSERVABILITY AGENTS                                      │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐                    │
│  │  Prometheus   │  │     Loki      │  │     OTLP      │                    │
│  │   Exporter    │  │    Tailer     │  │   Exporter    │                    │
│  └───────────────┘  └───────────────┘  └───────────────┘                    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                   RESILIENCE CONTROLLER                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Monitor heartbeats ──▶ detect offline nodes                         │    │
│  │  Check revocation feeds ──▶ isolate compromised providers            │    │
│  │  Orchestrate failover ──▶ reassign workloads                         │    │
│  │  Notify operators ──▶ Slack / PagerDuty webhooks                     │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                
Directory API

Provider Registry

Central registry for provider metadata, capabilities, and health status.

POST /v1/join-tokens
POST /v1/join
POST /v1/join/verify
GET /v1/providers/{id}
POST /v1/providers/{id}/heartbeat
MIG Manager

GPU Partitioning

Idempotent MIG configuration using nvidia-smi commands.

_enable_mig()
_destroy_instances()
_create_instances(profile, count)
query() ──▶ MIGStatus
Observability

Telemetry Stack

Multi-protocol exporters for metrics, logs, and traces.

Prometheus: GPU utilization
Loki: structured JSON logs
OTLP: distributed traces
nvidia-smi --query
SECTION 13

Operations

Infrastructure and Deployment

Helm charts, Terraform modules, Kyverno policies, and operational scripts for production deployments.

+
SECTION 13

Operations and Deployment

ENABLING INFRASTRUCTURE · Helm, Terraform, Kyverno, Scripts

deployment/deploy_payments.sh orchestrates Kubernetes deployments for the payments service. It wraps kubectl and helm commands, applies manifests, waits for rollouts, and verifies service readiness. deployment/helm/ contains charts for launcher, sidecar relay, control plane, payments, and observability stacks. Values files configure image tags, replica counts, secrets references, and service accounts.

Kyverno policies enforce Cosign signatures on all pods labelled vracu-job=true. The policy references a cosign-public-key secret, instructing Kyverno to reject unsigned images. Network policies deny all ingress and egress for pods with vracu-job label, forcing operators to explicitly allow required traffic.

Deployment Pipeline

┌─────────────────────────────────────────────────────────────────────────────┐
│                          GIT COMMIT                                          │
│  feature branch ──▶ pull request ──▶ code review ──▶ merge                   │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       CI PIPELINE                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  1. Lint (ruff, mypy)                                                │    │
│  │  2. Unit tests (pytest)                                              │    │
│  │  3. Integration tests                                                │    │
│  │  4. Security scan (Trivy, Anchore)                                   │    │
│  │  5. Build container images                                           │    │
│  │  6. Sign images (Cosign)                                             │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    INFRASTRUCTURE                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Terraform plan ──▶ review ──▶ apply                                 │    │
│  │      ├── VPCs, subnets, security groups                              │    │
│  │      ├── EKS clusters, node groups                                   │    │
│  │      ├── RDS instances, SQS queues                                   │    │
│  │      └── Secrets Manager, KMS keys                                   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     HELM DEPLOYMENT                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  helm upgrade --install launcher deployment/helm/launcher            │    │
│  │  helm upgrade --install control-plane deployment/helm/control-plane  │    │
│  │  helm upgrade --install payments deployment/helm/payments            │    │
│  │  helm upgrade --install observability deployment/helm/observability  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                    ADMISSION CONTROL                                         │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  Kyverno policies:                                                   │    │
│  │      ├── require-signed-images (Cosign verification)                 │    │
│  │      ├── deny-privileged-containers                                  │    │
│  │      └── require-resource-limits                                     │    │
│  │                                                                      │    │
│  │  Network policies:                                                   │    │
│  │      ├── deny-all-ingress (vracu-job pods)                           │    │
│  │      └── deny-all-egress (vracu-job pods)                            │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                   POST-DEPLOY VALIDATION                                     │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  scripts/validate_production.py                                      │    │
│  │      ├── health checks                                               │    │
│  │      ├── smoke tests                                                 │    │
│  │      ├── metrics verification                                        │    │
│  │      └── dashboard screenshots                                       │    │
│  │                                                                      │    │
│  │  Generate: DEPLOYMENT_COMPLETE_*.md                                  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                
Security Controls
Security manifests in signed images, Kyverno policies, network isolation, Secrets Manager integration, dual signatures for on-chain settlements, and attested hardware enforcement. IAM policy JSON files document least-privilege requirements. Combined with signed image enforcement, these policies mitigate supply-chain risk.
SECTION 14

Validation

Testing and Observability

Comprehensive testing infrastructure, Prometheus metrics, structured logging, and production evidence artifacts.

+
SECTION 14

Validation and Testing

ENABLING INFRASTRUCTURE · Testing Infrastructure, Observability, Metrics

vracu-launcher/tests/ contains unit tests covering adapters, privacy, and worker logic. tests/privacy/test_split_key.py validates session rotation monotonicity and attestation hash binding. tests/adapters/test_training.py asserts default resource profiles and command normalisation. tests/worker/test_processor.py uses fixtures to simulate job processing.

validation_e2e/ and local_validation/ directories host scripts and transcripts demonstrating full-stack runs. Files capture outcomes including validation output, test summaries, and fixes. Shell scripts orchestrate integration runs, capturing logs and metrics snapshots. SQLite databases act as artefacts for audits.

Validation Feedback Loop

┌─────────────────────────────────────────────────────────────────────────────┐
│                        AUTOMATED TESTS                                       │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐                    │
│  │  Unit Tests   │  │ Integration   │  │   E2E Tests   │                    │
│  │  pytest       │  │    Tests      │  │  validation_  │                    │
│  │  tests/       │  │  integration_ │  │    e2e/       │                    │
│  │               │  │    test.py    │  │               │                    │
│  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘                    │
│          │                  │                  │                            │
│          └──────────────────┼──────────────────┘                            │
│                             │                                               │
│                             ▼                                               │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     CI PIPELINES                                     │    │
│  │  Run on every commit, PR, and scheduled                              │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      OBSERVABILITY                                           │
│  ┌───────────────────────────────────────────────────────────────────┐      │
│  │  Prometheus Metrics                                                │      │
│  │      job_submit_latency_seconds                                    │      │
│  │      privacy_attestation_failures_total                            │      │
│  │      privacy_session_cancellations_total                           │      │
│  │      payments_webhooks_total                                       │      │
│  │      control_plane_reservation_duration_seconds                    │      │
│  └───────────────────────────────────────────────────────────────────┘      │
│  ┌───────────────────────────────────────────────────────────────────┐      │
│  │  Structured Logs (JSON)                                            │      │
│  │      {"event": "job_submit", "job_id": "...", "adapter": "..."}    │      │
│  └───────────────────────────────────────────────────────────────────┘      │
│  ┌───────────────────────────────────────────────────────────────────┐      │
│  │  Distributed Traces (OTLP)                                         │      │
│  │      launcher.request → privacy.authorize → control_plane.reserve  │      │
│  └───────────────────────────────────────────────────────────────────┘      │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     VALIDATION CHECKLISTS                                    │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │  [ ] Health checks passing                                           │    │
│  │  [ ] Metrics endpoints responding                                    │    │
│  │  [ ] Kyverno policies enforced                                       │    │
│  │  [ ] Ledger reconciliation complete                                  │    │
│  │  [ ] Dashboard screenshots captured                                  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                      EVIDENCE ARCHIVES                                       │
│  ┌───────────────────────────────────────────────────────────────────┐      │
│  │  validation_output.txt                                             │      │
│  │  PRODUCTION_PROOF.md                                               │      │
│  │  PRODUCTION_EVIDENCE_FINAL.md                                      │      │
│  │  AWS_DEPLOYMENT_COMPLETE_FINAL_REPORT.md                           │      │
│  │  BLOCKCHAIN_VALIDATION_COMPLETE.md                                 │      │
│  │  payments_production.db (SQLite snapshot)                          │      │
│  │  control_plane_local.db (SQLite snapshot)                          │      │
│  └───────────────────────────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────────────────────────────┘
                
Validation Script Pattern
def validate_production():
    # Health checks
    assert check_http("/health") == 200
    assert check_http("/metrics") == 200
    
    # Submit test job and verify completion
    job_id = submit_test_job()
    wait_for_completion(job_id, timeout=300)
    
    # Verify metrics within thresholds
    metrics = scrape_prometheus()
    assert metrics["job_submit_latency_seconds"] < 30.0
    assert metrics["privacy_attestation_failures_total"] == 0
    
    # Reconcile ledger with on-chain state
    ledger_total = query_ledger_minted()
    onchain_total = query_onchain_minted()
    assert ledger_total == onchain_total
    
    # Generate validation report
    write_report({
        "metrics": metrics,
        "ledger_total": ledger_total,
        "onchain_total": onchain_total,
        "timestamp": datetime.utcnow().isoformat()
    })
                
Metrics

Prometheus Metrics

Comprehensive metrics across all subsystems feed Grafana dashboards for operational visibility.

job_submit_latency_seconds (histogram)
privacy_attestation_failures_total (counter)
privacy_session_cancellations_total (counter)
payments_webhooks_total (counter)
payments_queue_backlog (gauge)
control_plane_reservation_duration_seconds (histogram)
Evidence

Audit Artifacts

Numerous PDFs and Markdown reports document validation efforts for auditors.

E2E_TEST_RESULTS.md
PRODUCTION_PROOF.md
AWS_DEPLOYMENT_COMPLETE.md
BLOCKCHAIN_VALIDATION.md
SECTION 15

Epilogue

Guarantees and Future Trajectories

System interlocks, security guarantees, governance mechanisms, and the path forward for confidential compute.

+
SECTION 15

Epilogue

Guarantees, Interlocks, and Future Trajectories

Throughout this document each guarantee traces to concrete modules: privacy enforcement in launcher/privacy, orchestration in launcher/worker, sidecar execution in sidecar/runtime, control plane economics in control_plane/*, payments in payments/*, on-chain settlement in contracts/, and provider tooling in alien-* directories. Signed-image policies, network isolation, and Terraform infrastructure anchor operational claims.

Confidential compute requires interlocks: attesters validate hardware, key brokers issue secrets, revocation pipelines cancel workloads, sidecars enforce key usage, drivers launch jobs, control plane allocates capacity, payments reconcile usage, and smart contracts finalise settlement. Each interlock uses typed data structures to prevent drift. Metrics and logs stitch the loops together.

System Weave

┌─────────────────────────────────────────────────────────────────────────────┐
│                              SDK / CLI                                       │
│                    phase4_sdk ←→ Typer CLI ←→ External Partners              │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          LAUNCHER API                                        │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐   │
│  │   Routes    │───▶│   Service   │───▶│   Privacy   │───▶│   Worker    │   │
│  │  /v1/jobs   │    │  LauncherSvc│    │  PrivacyGate│    │ JobProcessor│   │
│  └─────────────┘    └─────────────┘    └──────┬──────┘    └──────┬──────┘   │
└────────────────────────────────────────────────┼──────────────────┼─────────┘
                                                 │                  │
                    ┌────────────────────────────┘                  │
                    │                                               │
                    ▼                                               ▼
┌─────────────────────────────────┐             ┌─────────────────────────────┐
│     PRIVACY DESIGN PRINCIPLES   │             │      SIDECAR RUNTIME        │
│  ┌─────────────────────────┐   │             │  ┌─────────────────────┐    │
│  │ Attesters (NVIDIA/TDX/  │   │◀───────────▶│  │ Handshake / Rotate  │    │
│  │           SNP)          │   │   Mutual    │  │ TLS / Execute       │    │
│  ├─────────────────────────┤   │   Attestation│  └─────────────────────┘    │
│  │ Key Brokers (KMS/Vault/ │   │             │                              │
│  │         Split-Key)      │   │             │                              │
│  ├─────────────────────────┤   │             │                              │
│  │ Revocation Registry     │   │             │                              │
│  └─────────────────────────┘   │             │                              │
└────────────────────────────────┘             └──────────────────────────────┘
                    │
                    │
                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          CONTROL PLANE                                       │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐  │
│  │  Oracle   │  │ Scheduler │  │ Metering  │  │Settlement │  │Governance │  │
│  │  Pricing  │  │   DRF     │  │  Slices   │  │  Router   │  │ Timelock  │  │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └───────────┘  │
└────────┼──────────────┼──────────────┼──────────────┼────────────────────────┘
         │              │              │              │
         └──────────────┼──────────────┼──────────────┘
                        │              │
                        ▼              ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     PAYMENTS / ON-CHAIN                                      │
│  ┌───────────────────────┐            ┌───────────────────────┐             │
│  │     Ledger DAO        │            │   ConversionRouter    │             │
│  │  ┌─────────────────┐  │◀──────────▶│  ┌─────────────────┐  │             │
│  │  │ PaymentCredits  │  │   Mint     │  │ burnAVLForACU() │  │             │
│  │  │ ProviderPayouts │  │   Burn     │  │ setSpendLimit() │  │             │
│  │  └─────────────────┘  │            │  └─────────────────┘  │             │
│  └───────────────────────┘            └───────────────────────┘             │
│                                                 │                            │
│                                                 ▼                            │
│                                        ┌───────────────────┐                 │
│                                        │  Arbitrum L2      │                 │
│                                        │  AVL ←→ ACU       │                 │
│                                        └───────────────────┘                 │
└─────────────────────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                       PROVIDER ECOSYSTEM                                     │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐ │
│  │ Directory API │  │ Provider Node │  │ MIG Manager   │  │ Observability │ │
│  │ Join Tokens   │  │ Heartbeats    │  │ nvidia-smi    │  │ Prometheus    │ │
│  └───────────────┘  └───────────────┘  └───────────────┘  └───────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
                
"The repository is both blueprint and proof, inviting stakeholders to inspect, validate, and extend the platform with confidence rooted in verifiable engineering."
— System Architecture

Potential extensions include new attesters (e.g., Intel TDX updates), additional adapters for specialised workloads, deeper integration with external observability platforms, and on-chain governance. The modular architecture—loaders, registry-based factories, configuration-driven services—facilitates such evolution without rewriting core components.

Security manifests in signed images, Kyverno policies, network isolation, Secrets Manager integration, dual signatures for on-chain settlements, and attested hardware enforcement. Governance modules enforce timelocks and multi-role approvals. Compliance artifacts connect governance actions to sign-offs, proving that cross-organisational approvals are embedded in both software and process.

The network's privacy guarantees emerge from three complementary layers. Hardware attestation prevents operators from inspecting workload data. Encrypted input/output pipelines protect data in transit and at rest. Cryptographic secure aggregation prevents peers from observing individual contributions during federated training. Together, these mechanisms enable confidential computation on untrusted infrastructure.

Readers can independently verify claims: run tests/, execute run_live_demo.sh, inspect payments_production.db, replay control-plane logs, verify Cosign signatures, query on-chain contracts via ABI, and compare metrics dashboards. The epilogue invites verification rather than asking for trust.

As new features land (PQ-safe keys, new adapters, region expansion), the same pattern will continue: code first, tests second, documentation third, evidence fourth. Future whitepaper revisions will follow this cadence, keeping technical truth aligned with narrative. The repository is both blueprint and proof, inviting stakeholders to inspect, validate, and extend the platform with confidence rooted in verifiable engineering.

Final Guarantee
The revolution will not be centralized. In the end, this is a story about choice. The choice to build rather than complain. The choice to collaborate rather than compete. The choice to open source the future rather than patent it. These choices, multiplied across thousands of contributors and millions of computations, constitute nothing less than a peaceful revolution in how we organize computational power.