Skip to content

Comments

agent: reduce full config check frequency from 5s to 1m and compare config hashes instead#3028

Draft
nikw9944 wants to merge 3 commits intomainfrom
nikw/config-agent-cpu
Draft

agent: reduce full config check frequency from 5s to 1m and compare config hashes instead#3028
nikw9944 wants to merge 3 commits intomainfrom
nikw/config-agent-cpu

Conversation

@nikw9944
Copy link
Contributor

@nikw9944 nikw9944 commented Feb 17, 2026

Summary

  • Instead of applying config to the device every 5s, apply it only when the config hash changes, or when it has been 60s since it was last applied.
  • Implement hash-based config polling to reduce network overhead by 99%+ when configs are unchanged
  • See comment below for call sequence before and after this change (new sequence has been added to controller's README.md)

Changes

Controller

  • Refactor GetConfig by extracting config generation into reusable helper functions
  • Add GetConfigHash gRPC endpoint that returns only SHA256 hash (64 bytes) instead of full config (~50KB)
  • Add controller_grpc_getconfighash_requests_total metric
  • Add architecture documentation with sequence diagram

Agent

  • Refactor main loop
  • Implement a simple caching scheme for device config and config hash
  • Replace 5-second full config polling with hash polling, fetch full config only on changes
  • Full config always applied after every cache timeout (default 60s)

Testing Verification

  • Unit test updates
  • No functionality has changed so e2e tests should run as-is

@nikw9944 nikw9944 linked an issue Feb 17, 2026 that may be closed by this pull request
@nikw9944 nikw9944 changed the title agent: reduce network and CPU usage by reducing full config check frequency from 5s to 1m and comparing config hashes instead agent: reduce full config check frequency from 5s to 1m and compare config hashes instead Feb 17, 2026
@nikw9944 nikw9944 self-assigned this Feb 20, 2026
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from 7260b33 to e81a21e Compare February 20, 2026 19:41
@nikw9944
Copy link
Contributor Author

nikw9944 commented Feb 21, 2026

BEFORE: Simple polling every 5 seconds

┌─────────┐                 ┌────────────┐                 ┌────────────┐                  ┌─────────┐
│  Agent  │                 │ Controller │                 │ Controller │                  │   EOS   │
│  main() │                 │ GetConfig()│                 │  Config    │                  │ Device  │
│         │                 │   (gRPC)   │                 │  Generator │                  │         │
└────┬────┘                 └─────┬──────┘                 └─────┬──────┘                  └────┬────┘
     │                            │                              │                              │
     │ Every 5s:                  │                              │                              │
     │                            │                              │                              │
     │ GetBgpNeighbors()          │                              │                              │
     ├─────────────────────────────────────────────────────────────────────────────────────────►│
     │◄─────────────────────────────────────────────────────────────────────────────────────────┤
     │ [peer IPs]                 │                              │                              │
     │                            │                              │                              │
     │ GetConfigFromServer()      │                              │                              │
     ├───────────────────────────►│                              │                              │
     │                            │ processConfigRequest()       │                              │
     │                            ├─────────────────────────────►│                              │
     │                            │                              │ generateConfig()             │
     │                            │                              │  • deduplicateTunnels()      │
     │                            │                              │  • renderConfig()            │
     │                            │                              │    (~50KB config text)       │
     │                            │◄─────────────────────────────┤                              │
     │                            │ [config string]              │                              │
     │◄───────────────────────────┤                              │                              │
     │ ConfigResponse             │                              │                              │
     │ {config: "..."}            │                              │                              │
     │                            │                              │                              │
     │ AddConfigToDevice(config)  │                              │                              │
     ├─────────────────────────────────────────────────────────────────────────────────────────►│
     │◄─────────────────────────────────────────────────────────────────────────────────────────┤
     │ [config applied]           │                              │                              │
     │                            │                              │                              │
     │ sleep(5s)                  │                              │                              │
     │ goto top                   │                              │                              │
     │                            │                              │                              │

AFTER: Hash-based polling (5s hash check, 5m full config fetch)

┌─────────┐                 ┌────────────┐                 ┌────────────┐                  ┌─────────┐
│  Agent  │                 │ Controller │                 │ Controller │                  │   EOS   │
│  main() │                 │GetConfigHash                 │  Config    │                  │ Device  │
│         │                 │ GetConfig()│                 │  Generator │                  │         │
└────┬────┘                 └─────┬──────┘                 └─────┬──────┘                  └────┬────┘
     │                            │                              │                              │
     │ Every 5s:                  │                              │                              │
     │                            │                              │                              │
     │ GetBgpNeighbors()          │                              │                              │
     ├─────────────────────────────────────────────────────────────────────────────────────────►│
     │◄─────────────────────────────────────────────────────────────────────────────────────────┤
     │ [peer IPs]                 │                              │                              │
     │                            │                              │                              │
     │ Decision: should fetch?    │                              │                              │
     │  • First run (no hash)?    │                              │                              │
     │  • 1m since last apply?    │                              │                              │
     │  • Hash changed?           │                              │                              │
     │                            │                              │                              │
     │ GetConfigHashFromServer()  │                              │                              │
     ├───────────────────────────►│                              │                              │
     │                            │ processConfigRequest()       │                              │
     │                            ├─────────────────────────────►│                              │
     │                            │                              │ generateConfig()             │
     │                            │                              │  • deduplicateTunnels()      │
     │                            │                              │  • renderConfig()            │
     │                            │                              │ SHA256(config)               │
     │                            │◄─────────────────────────────┤                              │
     │                            │ [hash only]                  │                              │
     │◄───────────────────────────┤                              │                              │
     │ ConfigHashResponse         │                              │                              │
     │ {hash: "abc123..."}        │                              │                              │
     │ (64 bytes)                 │                              │                              │
     │                            │                              │                              │
     │ Compare: hash != lastHash? │                              │                              │
     │                            │                              │                              │
     ├─── if YES (or first run or 5m timeout):                                                  │
     │                            │                              │                              │
     │    fetchConfigFromController()                            │                              │
     │    ├─► GetConfigFromServer()                              │                              │
     │    │   ──────────────────► │                              │                              │
     │    │                       │ processConfigRequest()       │                              │
     │    │                       ├─────────────────────────────►│                              │
     │    │                       │                              │ generateConfig()             │
     │    │                       │                              │  • deduplicateTunnels()      │
     │    │                       │                              │  • renderConfig()            │
     │    │                       │                              │    (entire config text)      │
     │    │                       │◄─────────────────────────────┤                              │
     │    │   ◄──────────────────│ [config string]               │                              │
     │    │   ConfigResponse      │                              │                              │
     │    │   {config: "..."}     │                              │                              │
     │    │                       │                              │                              │
     │    ├─► computeChecksum(config)                            │                              │
     │    │   [local SHA256]      │                              │                              │
     │    │                       │                              │                              │
     │    └─► return config+hash  │                              │                              │
     │                            │                              │                              │
     │    applyConfig()           │                              │                              │
     │    └─► AddConfigToDevice(config)                          │                              │
     │        ─────────────────────────────────────────────────────────────────────────────────►│
     │        ◄─────────────────────────────────────────────────────────────────────────────────┤
     │        [config applied]    │                              │                              │
     │                            │                              │                              │
     │    lastChecksum = hash     │                              │                              │
     │    lastApplyTime = now     │                              │                              │
     │                            │                              │                              │
     ├─── else: skip this cycle (hash unchanged, no work needed) |                              │
     │                            │                              │                              │
     │ sleep(5s)                  │                              │                              │
     │ goto top                   │                              │                              │
     │                            │                              │                              │

@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch 3 times, most recently from eb5c5c5 to e9af3ae Compare February 24, 2026 17:26
…ture docs

Extract config generation logic into reusable functions:
- generateConfig() - renders device config with deduplication
- processConfigRequest() - validates request and finds unknown BGP peers

This refactoring prepares for adding GetConfigHash endpoint that will
share the same config generation logic.

Also add architecture documentation with sequence diagram showing
agent-controller communication flow.
…ange detection

Add new GetConfigHash RPC that returns only the SHA256 hash of the
device configuration (64 bytes) instead of the full config (~50KB).

This enables agents to efficiently check for config changes without
transferring the full configuration on every poll.

Changes:
- Add GetConfigHash RPC to controller.proto
- Implement GetConfigHash() handler that reuses processConfigRequest()
- Add controller_grpc_getconfighash_requests_total metric
- Regenerate protobuf code
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from e9af3ae to 6fa1ee4 Compare February 24, 2026 21:02
…meout

Replace aggressive 5-second full config polling with hash-based change
detection. The agent now:
- Checks config hash every 5 seconds (64 bytes)
- Only fetches and applies full config when hash changes
- Forces full config check after timeout (default 60s) as safety net

This dramatically reduces:
- Network bandwidth (99%+ when config unchanged)
- EOS device load (no config application when unchanged)
- Agent CPU (hash computed only when fetching new config)

Add --config-cache-timeout-in-seconds flag to control the forced full
config check interval.

Refactor main loop:
- Split pollControllerAndConfigureDevice into focused functions
- Add computeChecksum() helper for SHA256 hashing
- Add fetchConfigFromController() to get config and compute hash
- Add applyConfig() to apply config to EOS device
- Rename variables: cachedConfigHash, configCacheTime, configCacheTimeout

Add GetConfigHashFromServer() client function to call new gRPC endpoint.
@nikw9944 nikw9944 force-pushed the nikw/config-agent-cpu branch from 6fa1ee4 to 2ab7f72 Compare February 24, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce config agent resource consumption

1 participant