Skip to content

client: add onchain reconciler to daemon with enable/disable CLI#3034

Open
snormore wants to merge 4 commits intomainfrom
snor/client-onchain-reconciler
Open

client: add onchain reconciler to daemon with enable/disable CLI#3034
snormore wants to merge 4 commits intomainfrom
snor/client-onchain-reconciler

Conversation

@snormore
Copy link
Contributor

@snormore snormore commented Feb 18, 2026

Summary of Changes

  • Add an onchain reconciler to doublezerod that polls the DZ Ledger and automatically provisions/removes tunnels when users are activated or deactivated, replacing the CLI-driven /provision flow
  • Remove the JSON state file (doublezerod.json) crash recovery mechanism — the daemon now re-provisions from onchain state on restart
  • Add doublezero enable / doublezero disable CLI commands to control the reconciler, with connect implicitly enabling it
  • Move client IP discovery from the CLI to the daemon (auto-discovered via default route or --client-ip flag), deprecating --client-ip on the CLI
  • Simplify the CLI by dropping ~400 lines of provisioning logic (device fetching, tunnel parameter computation, multicast resolution) — the CLI now just creates the onchain user and waits for the daemon's reconciler to provision the tunnel
  • Add drift detection: reconciler detects when onchain state changes (e.g., multicast group subscriptions) and re-provisions running services to match
  • Add new e2e test (TestE2E_IBRL_EnableDisable) covering the enable/disable lifecycle

RFC: rfcs/rfc17-client-onchain-reconciler.md

Diff Breakdown

Category Files Lines (+/-) Net
Core logic 23 +2,659 / -1,548 +1,111
Scaffolding 6 +70 / -17 +53
Tests 16 +2,436 / -930 +1,506
Fixtures 23 +39 / -208 -169
Docs 7 +221 / -18 +203
Total 75 +5,425 / -2,721 +2,704

~41% of the net additions are tests; core logic is roughly balanced between new reconciler code and removed CLI/state-file code.

Note: This PR exceeds the recommended ~500 lines of new non-test code. The reconciler, state management, client IP discovery, CLI refactor, and service interface changes are tightly coupled and difficult to split further without shipping a half-working intermediate state. The RFC covers the full design.

Key files (click to expand)

Testing Verification

  • 1,427-line reconciler test suite covering: provision on user activation, removal on deactivation, drift detection and re-provisioning, multiple user type handling, enable/disable lifecycle, error recovery, tunnel source resolution
  • State persistence tests (fresh install, migration from old state file, read/write cycles)
  • Client IP discovery tests (default route resolution, validation, martian filtering)
  • Onchain fetcher tests (caching behavior, TTL expiry, error propagation)
  • New e2e test TestE2E_IBRL_EnableDisable validating the full enable → connect → disable → re-enable flow against a live devnet
  • Existing e2e tests updated to work with the reconciler-driven flow

@snormore snormore force-pushed the snor/client-onchain-reconciler branch 3 times, most recently from 590d528 to 988a52e Compare February 18, 2026 22:41
@snormore snormore requested a review from Copilot February 18, 2026 23:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves tunnel provisioning from the doublezero CLI into an onchain-driven reconciler loop inside the doublezerod daemon, adds runtime enable/disable controls, and updates status/reporting + tests/fixtures accordingly.

Changes:

  • Add an onchain reconciler (polling + provisioning/removal) to doublezerod, with persisted enable/disable state and new /enable, /disable, /v2/status endpoints.
  • Update CLI flows (connect, status, plus new enable/disable) to interact with the reconciler rather than directly provisioning.
  • Add caching onchain fetcher + adjust tests/e2e fixtures to account for reconciler state and updated output.

Reviewed changes

Copilot reviewed 66 out of 66 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
rfcs/rfc17-client-onchain-reconciler.md RFC documenting reconciler design, state, endpoints, and rollout plan
CHANGELOG.md Changelog entry describing reconciler + CLI enable/disable
e2e/user_ban_test.go Treat missing tunnel interface as successful route withdrawal
e2e/multicast_test.go Update status fixture usage to templated reconciler-aware output
e2e/internal/fixtures/diff.go Make CLI-table diff parsing robust to non-table preamble lines
e2e/ibrl_with_allocated_ip_test.go Update status fixture usage to templated reconciler-aware output
e2e/ibrl_test.go Update status fixture usage to templated reconciler-aware output
e2e/ibrl_enable_disable_test.go New E2E test for connect → disable → enable → restart persistence lifecycle
e2e/fixtures/multicast/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/multicast/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/multicast/doublezero_status_connected_subscriber.tmpl Add “Reconciler” column to connected multicast subscriber fixture
e2e/fixtures/multicast/doublezero_status_connected_publisher.tmpl Add “Reconciler” column to connected multicast publisher fixture
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/ibrl_with_allocated_addr/doublezero_status_connected.tmpl Add “Reconciler” column to connected allocated-IP fixture
e2e/fixtures/ibrl/doublezero_status_disconnected.txt Remove old disconnected status fixture
e2e/fixtures/ibrl/doublezero_status_disconnected.tmpl New reconciler-aware disconnected status fixture template
e2e/fixtures/ibrl/doublezero_status_connected.tmpl Add “Reconciler” column to connected IBRL fixture
client/doublezerod/cmd/doublezerod/main.go Add flags for client IP, reconciler poll interval, and state dir; pass into runtime
client/doublezerod/internal/runtime/run.go Wire reconciler + state migration + caching fetcher; update routes handler wiring; latency now uses fetcher
client/doublezerod/internal/runtime/run_test.go Update runtime tests to new Run() signature; remove statefile recovery assertions
client/doublezerod/internal/runtime/clientip.go New client IP auto-discovery (explicit → interfaces → ifconfig.me)
client/doublezerod/internal/runtime/clientip_test.go Unit tests for public IP classification and explicit IP behavior
client/doublezerod/internal/reconciler/reconciler.go New reconciler loop + enable/disable endpoints + /v2/status
client/doublezerod/internal/reconciler/state.go Persisted reconciler enable/disable state + migration from old doublezerod.json
client/doublezerod/internal/reconciler/state_test.go Unit tests for state load/write + migration behavior
client/doublezerod/internal/reconciler/metrics.go Prometheus metrics for reconciler polls/provisions/removals/matched users
client/doublezerod/internal/onchain/fetcher.go New TTL-based caching fetcher shared by reconciler and latency subsystem
client/doublezerod/internal/onchain/fetcher_test.go Unit tests for caching behavior (TTL, stale-on-error, concurrency)
client/doublezerod/internal/manager/manager.go Remove DB/statefile dependency; add ResolveTunnelSrc + GetProvisionedServices
client/doublezerod/internal/manager/http_test.go Update manager HTTP tests for removed DB + new GetProvisionedServices
client/doublezerod/internal/manager/db.go Remove old on-disk state DB implementation
client/doublezerod/internal/manager/db_test.go Remove tests for deleted DB/statefile system
client/doublezerod/internal/manager/fixtures/doublezerod.*.json Remove fixtures used exclusively for deleted DB/statefile behavior
client/doublezerod/internal/services/base.go Remove DBReaderWriter interface (services now keep ProvisionRequest in memory)
client/doublezerod/internal/services/services_test.go Update service creation tests after DB removal
client/doublezerod/internal/services/ibrl.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/services/edgefiltering.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/services/multicast.go Store ProvisionRequest in memory; drop DB usage; expose ProvisionRequest()
client/doublezerod/internal/latency/smartcontract.go Remove old direct smartcontract fetcher module (replaced by fetcher integration)
client/doublezerod/internal/latency/manager.go Support injected Fetcher; adjust SmartContractFunc signature and fetch path
client/doublezerod/internal/latency/manager_test.go Update tests for new smartcontract func signature and removed program ID options
client/doublezerod/internal/api/routes.go Replace DBReader with ServiceStateReader (provisioned services now in manager)
client/doublezerod/internal/api/routes_test.go Update routes tests to use ServiceStateReader mock
client/doublezero/src/servicecontroller.rs Add v2 status + enable/disable calls; remove CLI provisioning/remove/resolve-route APIs
client/doublezero/src/routes.rs Remove resolve_route command surface; keep routes retrieval
client/doublezero/src/main.rs Allow enable/disable commands without version warning gate
client/doublezero/src/cli/command.rs Add doublezero enable / doublezero disable commands
client/doublezero/src/command/mod.rs Register enable/disable subcommands
client/doublezero/src/command/enable.rs New enable command implementation + unit tests
client/doublezero/src/command/disable.rs New disable command implementation + unit tests
client/doublezero/src/command/status.rs Use /v2/status; surface reconciler state in output/table; synthesize disconnected row when empty
client/doublezero/src/command/connect.rs Switch connect flow to best-effort enable reconciler + poll daemon status for provisioning
client/doublezero/src/command/disconnect.rs Stop calling daemon /remove; poll daemon for deprovision completion after onchain user deletion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@snormore snormore force-pushed the snor/client-onchain-reconciler branch 19 times, most recently from 5706744 to 0d5d872 Compare February 24, 2026 21:49
@snormore snormore marked this pull request as ready for review February 24, 2026 21:54
Add a reconciliation loop to doublezerod that polls the DZ Ledger and
automatically provisions/removes tunnels when users are activated or
deactivated onchain. This replaces the CLI-driven /provision flow and
the JSON state file crash recovery mechanism.

Key changes:
- Reconciler loop on NetlinkManager polls onchain state every 10s
- Persistent enabled/disabled state with migration from old state file
- Client IP auto-discovery via kernel default route with external fallback
- Caching onchain fetcher wrapping the serviceability SDK
- Drift detection re-provisions services when onchain state changes
- Enable/disable HTTP endpoints and v2 status endpoint
- Remove db.go state file, Recover() path, and DBReaderWriter interface
Adapt the doublezero CLI to work with the daemon's onchain reconciler
instead of directly driving tunnel provisioning.

- Add enable/disable commands to control the reconciler
- Simplify connect: create onchain user, enable reconciler, poll daemon
  status until tunnel appears (drops ~400 lines of provisioning logic)
- Simplify disconnect: delete onchain user and let reconciler tear down
- Rewrite status to use daemon v2 endpoint with reconciler state
- Deprecate --client-ip on CLI in favor of daemon flag
- Read client IP from daemon v2 status for onchain user creation
- Add TestE2E_IBRL_EnableDisable covering enable/disable lifecycle
- Update e2e fixtures for reconciler-driven status output
- Update existing e2e tests for reconciler flow (pass client-ip to daemon)
- Update CHANGELOG, DEVELOPMENT docs
@snormore snormore force-pushed the snor/client-onchain-reconciler branch from 0d5d872 to 5847e99 Compare February 25, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants