From c01076288b49e721d0216b0372406bbbf0d210b0 Mon Sep 17 00:00:00 2001 From: Marco Walz Date: Mon, 2 Mar 2026 17:40:30 +0100 Subject: [PATCH 1/4] feat: add canister-security skill Covers IC-specific security patterns: access control, anonymous principal rejection, TOCTOU/reentrancy, pre_upgrade traps, cycles monitoring, controller backup, fetchRootKey risks, admin guards, secret storage, storage exhaustion, callback trap inconsistency, and unbounded wait calls. Both Motoko and Rust implementations. Reviewed against IC interface specification, DFINITY security best practices, and recent audit findings. Adds @dfinity/security as CODEOWNER. Based on the draft from #51, with corrections: - inspect_message: specify boundary node bypass, not generic node - freezing_threshold: default is already 30 days, canister is uninstalled not deleted - fetchRootKey: not deprecated, required for local dev - Added pitfalls: pre_upgrade trap, storage exhaustion, callback trap after state mutation, unbounded wait calls - Links to official security best practices documentation --- .github/CODEOWNERS | 1 + skills/canister-security/SKILL.md | 363 ++++++++++++++++++++++++++++++ 2 files changed, 364 insertions(+) create mode 100644 skills/canister-security/SKILL.md diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 260fadd..c378684 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -24,6 +24,7 @@ package.json @JoshDFN @marc0olo @raymondk /skills/multi-canister/ @derlerd-dfinity /skills/stable-memory/ @derlerd-dfinity /skills/vetkd/ @andreacerulli +/skills/canister-security/ @dfinity/security /skills/oisy-wallet-signer/ @AntonioVentilii /skills/ic-dashboard/ @jp-dfinity diff --git a/skills/canister-security/SKILL.md b/skills/canister-security/SKILL.md new file mode 100644 index 0000000..df22ebe --- /dev/null +++ b/skills/canister-security/SKILL.md @@ -0,0 +1,363 @@ +--- +name: canister-security +description: "Secures IC canisters against IC-specific attack patterns. Covers access control, caller validation, cycles management, async state vulnerabilities (TOCTOU/reentrancy), controller safety, upgrade traps, and storage exhaustion. Use when implementing any canister that handles user data, tokens, or access control. Do NOT use for general canister programming patterns or deployment workflows." +license: Apache-2.0 +metadata: + title: Canister Security + category: Security +--- + +# Canister Security + +## What This Is + +IC canisters face security challenges that don't exist in traditional web development. The async messaging model creates TOCTOU (time-of-check-time-of-use) vulnerabilities, `canister_inspect_message` is NOT a reliable security boundary, and anyone on the internet can burn your cycles by sending update calls. This skill covers the IC-specific patterns that prevent exploitable canisters. + +## Prerequisites + +- For Motoko: `mops` package manager, `core = "2.0.0"` in mops.toml +- For Rust: `ic-cdk = "0.19"`, `candid = "0.10"` + +## Security Pitfalls + +1. **Relying on `canister_inspect_message` for access control.** This hook runs on a single replica without full consensus. A malicious boundary node can bypass it by forwarding the message anyway. It is also never called for inter-canister calls, query calls, or management canister calls. Always duplicate access checks inside every update method. Use `inspect_message` only as a cycle-saving optimization, never as a security boundary. See [security best practices: ingress message inspection](https://docs.internetcomputer.org/building-apps/security/iam#do-not-rely-on-ingress-message-inspection). + +2. **Forgetting to reject the anonymous principal.** Every endpoint that requires authentication must check that the caller is not the anonymous principal (`2vxsx-fae`). In Motoko use `Principal.isAnonymous(caller)`, in Rust compare `caller() != Principal::anonymous()`. Without this, unauthenticated callers can invoke protected methods — and if the canister uses the caller principal as an identity key (e.g., for balances), the anonymous principal becomes a shared identity anyone can use. + +3. **Reading state before an async call and assuming it's unchanged after (TOCTOU).** When your canister `await`s an inter-canister call, other messages can interleave and mutate state. This is one of the most critical sources of DeFi exploits on IC. Always deduct/lock state before the `await`, then compensate on failure (saga pattern). Consider per-caller or per-resource locks to prevent duplicate concurrent operations. See [security best practices: inter-canister calls](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls). + +4. **Trapping in `pre_upgrade`.** If `pre_upgrade` traps (e.g., serializing too much data exceeds the instruction limit), the canister becomes permanently non-upgradeable. This is arguably the most devastating IC-specific vulnerability. Avoid storing large data structures in the heap that must be serialized during upgrade. In Rust, use `ic-stable-structures` for direct stable memory access. In Motoko, use the `stable` keyword for persistent variables which are handled automatically. + +5. **Not monitoring cycles balance.** Every canister has a default `freezing_threshold` of 2,592,000 seconds (~30 days). When cycles drop below the threshold reserve, the canister freezes (rejects all update calls). When cycles reach zero, the canister is uninstalled — its code and memory are removed, though the canister ID and controllers survive. The real pitfall is not actively monitoring and topping up cycles. For production canisters holding valuable state, increase the freezing threshold and set up automated monitoring. + ```bash + # Check current settings (mainnet) + icp canister settings show backend -e ic + # Increase freezing threshold for high-value canisters + icp canister settings update backend --freezing-threshold 7776000 -e ic # 90 days + ``` + +6. **Single controller with no backup.** If you lose the controller identity's private key, the canister becomes unupgradeable forever. There is no recovery mechanism. Always add a backup controller or governance canister: + ```bash + icp canister settings update backend --add-controller -e ic + ``` + For high-value canisters, consider making an SNS or the canister itself a controller so governance can manage upgrades. + +7. **Calling `fetchRootKey()` in production.** `fetchRootKey()` fetches the root public key from the replica and trusts whatever it returns. On mainnet, the root key is hardcoded into the agent — calling `fetchRootKey()` there allows a man-in-the-middle to substitute a different key, breaking all verification. Only call `fetchRootKey()` in local development, guarded by an environment check. For frontends served by asset canisters, the root key is provided automatically. + +8. **Exposing admin methods without guards.** Every update method is callable by anyone on the internet. Admin methods (migration, config, minting) must explicitly check the caller against an allowlist. There is no built-in role system — you must implement it yourself. + +9. **Storing secrets in canister state.** Canister memory on standard application subnets is readable by node operators. Never store private keys, API secrets, or passwords in canister state. For on-chain secret management, use vetKD (threshold key derivation). + +10. **Allowing unbounded user-controlled storage.** If users can store data without limits, an attacker can fill the 4 GiB Wasm heap or stable memory, bricking the canister. Always enforce per-user storage quotas and validate input sizes. + +11. **Trapping in a callback after state mutation.** If your canister mutates state, then makes an inter-canister call, and the callback traps, the state mutations from before the call persist but the callback's mutations are rolled back. A malicious callee can exploit this to skip security-critical actions like debiting an account. Structure code so that critical state mutations happen either entirely before or entirely after the async boundary. Consider using `call_on_cleanup` for rollback logic and journaling for crash-safe state transitions. See [security best practices: callback traps](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls#securely-handle-traps-in-callbacks). + +12. **Unbounded wait calls preventing upgrades.** If your canister makes a call to an untrustworthy or buggy callee that never responds, the canister cannot be stopped (and therefore cannot be upgraded) while awaiting outstanding responses. Use bounded wait calls (timeouts) to ensure calls complete in bounded time regardless of callee behavior. See [security best practices: untrustworthy canisters](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls#be-aware-of-the-risks-involved-in-calling-untrustworthy-canisters). + +## How It Works + +### IC Security Model + +1. **Update calls** go through consensus — all nodes on a subnet execute the code and must agree on the result. Standard application subnets have 13 nodes; system and fiduciary subnets have more (28+). This makes update calls tamper-proof but slower (~2s). +2. **Query calls** run on a single replica — fast (~200ms) but the replica can return incorrect results. Replica-signed queries provide partial mitigation (the responding replica signs the response), but for full trust, use certified data or update calls for security-critical reads. +3. **Inter-canister calls** are async messages. Between sending a request and receiving the response, your canister can process other messages. State may change under you (see TOCTOU pitfall above). +4. **`canister_inspect_message`** runs before Candid decoding on a single node. It reduces cycle waste from spam but is not a security check — a malicious boundary node can bypass it. + +## Implementation + +### Motoko + +#### Access control pattern + +```motoko +import Principal "mo:core/Principal"; +import Array "mo:core/Array"; +import Runtime "mo:core/Runtime"; + +persistent actor { + + // --- Authorization state --- + stable var owner : Principal = Principal.fromText("aaaaa-aa"); // replaced during init + stable var admins : [Principal] = []; + + // --- Guards --- + + func requireAuthenticated(caller : Principal) { + if (Principal.isAnonymous(caller)) { + Runtime.trap("anonymous caller not allowed"); + }; + }; + + func requireOwner(caller : Principal) { + requireAuthenticated(caller); + if (caller != owner) { + Runtime.trap("caller is not the owner"); + }; + }; + + func requireAdmin(caller : Principal) { + requireAuthenticated(caller); + let isAdmin = Array.find( + admins, + func(a : Principal) : Bool { a == caller }, + ); + if (isAdmin == null and caller != owner) { + Runtime.trap("caller is not an admin"); + }; + }; + + // --- Public endpoints --- + + /// Anyone authenticated can call this + public shared ({ caller }) func publicAction() : async Text { + requireAuthenticated(caller); + "ok"; + }; + + /// Only admins / owner + public shared ({ caller }) func adminAction() : async () { + requireAdmin(caller); + // ... protected logic + }; + + /// Only the owner + public shared ({ caller }) func addAdmin(newAdmin : Principal) : async () { + requireOwner(caller); + admins := Array.append(admins, [newAdmin]); + }; +}; +``` + +#### Async safety (saga pattern) + +```motoko +public shared ({ caller }) func transfer(to : Principal, amount : Nat) : async () { + // CAPTURE caller and validate BEFORE any await + let sender = caller; + requireAuthenticated(sender); + + // Validate and DEDUCT FIRST, then make the external call + let balance = getBalance(sender); + if (balance < amount) { Runtime.trap("insufficient balance") }; + deductBalance(sender, amount); + + try { + await otherCanister.deposit(to, amount); + } catch (e) { + // Compensate: re-credit on failure + creditBalance(sender, amount); + Runtime.trap("transfer failed: " # Error.message(e)); + }; +}; +``` + +#### inspect_message (cycle drain reduction only) + +```motoko +// Inside persistent actor { ... } + +system func inspect( + { + caller : Principal; + msg : { + #adminAction : () -> (); + #addAdmin : () -> Principal; + #publicAction : () -> (); + } + } +) : Bool { + switch (msg) { + // Admin methods: reject anonymous to save cycles on Candid decoding + case (#adminAction _) { not Principal.isAnonymous(caller) }; + case (#addAdmin _) { not Principal.isAnonymous(caller) }; + // Public methods: accept all + case (_) { true }; + }; +}; +``` + +### Rust + +#### Access control pattern + +```rust +use ic_cdk::{caller, init, update}; +use candid::Principal; +use std::cell::RefCell; + +thread_local! { + static OWNER: RefCell = RefCell::new(Principal::anonymous()); + static ADMINS: RefCell> = RefCell::new(vec![]); +} + +// --- Guards --- + +fn require_authenticated() -> Principal { + let caller = caller(); + assert_ne!(caller, Principal::anonymous(), "anonymous caller not allowed"); + caller +} + +fn require_owner() -> Principal { + let caller = require_authenticated(); + OWNER.with(|o| { + assert_eq!(caller, *o.borrow(), "caller is not the owner"); + }); + caller +} + +fn require_admin() -> Principal { + let caller = require_authenticated(); + let is_authorized = OWNER.with(|o| caller == *o.borrow()) + || ADMINS.with(|a| a.borrow().contains(&caller)); + assert!(is_authorized, "caller is not an admin"); + caller +} + +// --- Init --- + +#[init] +fn init(owner: Principal) { + OWNER.with(|o| *o.borrow_mut() = owner); +} + +// --- Endpoints --- + +#[update] +fn public_action() -> String { + require_authenticated(); + "ok".to_string() +} + +#[update] +fn admin_action() { + require_admin(); + // ... protected logic +} + +#[update] +fn add_admin(new_admin: Principal) { + require_owner(); + ADMINS.with(|a| a.borrow_mut().push(new_admin)); +} + +ic_cdk::export_candid!(); +``` + +#### inspect_message (cycle drain reduction only) + +```rust +use ic_cdk::api::call::{accept_message, method_name}; + +/// Pre-filter to reduce cycle waste from spam. +/// Runs on ONE node. Can be bypassed. NOT a security check. +/// Always duplicate real access control inside each method. +#[ic_cdk::inspect_message] +fn inspect_message() { + let method = method_name(); + match method.as_str() { + // Admin methods: only accept from non-anonymous callers + "admin_action" | "add_admin" => { + if caller() != Principal::anonymous() { + accept_message(); + } + // Silently reject anonymous — saves cycles on Candid decoding + } + // Public methods: accept all + _ => accept_message(), + } +} +``` + +#### Async safety (saga pattern) + +```rust +#[update] +async fn transfer(to: Principal, amount: u64) { + // CAPTURE caller and validate BEFORE any .await + let sender = require_authenticated(); + + // Validate and DEDUCT FIRST, then make the async call + let balance = get_balance(&sender); + assert!(balance >= amount, "insufficient balance"); + deduct_balance(&sender, amount); + + // Make inter-canister call + match other_canister_deposit(to, amount).await { + Ok(()) => { /* success */ } + Err(err) => { + // Compensate: re-credit on failure + credit_balance(&sender, amount); + ic_cdk::trap(&format!("transfer failed: {:?}", err)); + } + } +} +``` + +## Deploy & Test + +### Local Development + +```bash +icp network start -d +icp deploy backend +``` + +### Test Access Control + +```bash +# 1. Authenticated call should succeed (default identity is owner) +icp canister call backend public_action '()' +# Expected: ("ok") + +# 2. Anonymous call should fail +icp canister call backend public_action '()' --identity anonymous +# Expected: Error containing "anonymous caller not allowed" + +# 3. Admin call from owner should succeed +icp canister call backend admin_action '()' +# Expected: () + +# 4. Admin call from non-admin should fail +icp identity new attacker +icp canister call backend admin_action '()' --identity attacker +# Expected: Error containing "caller is not an admin" +``` + +### Mainnet Security Checklist + +```bash +# 1. Verify controllers include a backup +icp canister settings show backend -e ic +# Expected: at least 2 controllers (your identity + backup) + +# 2. Verify freezing threshold is adequate +# Default: 2,592,000 seconds (30 days). Increase for high-value canisters. + +# 3. Verify cycles balance is healthy — well above freezing threshold reserve +``` + +## Verify It Works + +```bash +# 1. Anonymous rejection works +icp canister call backend public_action '()' --identity anonymous +# Expected: Error — "anonymous caller not allowed" + +# 2. Authenticated call succeeds +icp canister call backend public_action '()' +# Expected: ("ok") + +# 3. Admin-only method rejects non-admin +icp identity new test-user +icp canister call backend admin_action '()' --identity test-user +# Expected: Error — "caller is not an admin" + +# 4. Owner can add admin +icp canister call backend add_admin '(principal "aaaaa-aa")' +# Expected: () + +# 5. Freezing threshold is configured (mainnet) +icp canister settings show backend -e ic +# Expected: Freezing threshold >= 2,592,000 +``` From ab8b4246e4703cb81bbae927fa26b1d499ae468d Mon Sep 17 00:00:00 2001 From: Marco Walz Date: Mon, 2 Mar 2026 18:14:21 +0100 Subject: [PATCH 2/4] fix: correct ic-cdk 0.19 and core 2.0 APIs, remove deploy boilerplate - Rust: caller() -> msg_caller(), api::call::{accept_message, method_name} -> api::{accept_message, msg_method_name} (removed in ic-cdk 0.19) - Motoko: Array.append -> Array.concat (renamed in base->core migration) - Add missing Error import note to saga pattern - Remove deploy boilerplate, consolidate into Verify It Works - Add cross-referencing guidance to CONTRIBUTING.md --- CONTRIBUTING.md | 1 + skills/canister-security/SKILL.md | 56 ++++++++----------------------- 2 files changed, 15 insertions(+), 42 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 5de7caf..dbc1bda 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -154,6 +154,7 @@ The website auto-generates from SKILL.md frontmatter — no need to edit any sou - **Annotate all code blocks** with language identifiers (` ```motoko `, ` ```rust `, ` ```bash `, etc.). - **Include canister IDs and URLs** for both local and mainnet environments. - **Keep it flat.** One file per skill. No nested directories, no images, no external dependencies. +- **Don't duplicate other skills.** If another skill covers a pattern in depth (e.g., `canister-security` for access control and async safety), reference it by name in your pitfalls instead of inlining the pattern. This keeps maintenance centralized and ensures agents get the authoritative version. The `description` field is the primary mechanism agents use to discover related skills — cross-references in pitfalls serve as secondary hints. ## Categories diff --git a/skills/canister-security/SKILL.md b/skills/canister-security/SKILL.md index df22ebe..b13304d 100644 --- a/skills/canister-security/SKILL.md +++ b/skills/canister-security/SKILL.md @@ -22,7 +22,7 @@ IC canisters face security challenges that don't exist in traditional web develo 1. **Relying on `canister_inspect_message` for access control.** This hook runs on a single replica without full consensus. A malicious boundary node can bypass it by forwarding the message anyway. It is also never called for inter-canister calls, query calls, or management canister calls. Always duplicate access checks inside every update method. Use `inspect_message` only as a cycle-saving optimization, never as a security boundary. See [security best practices: ingress message inspection](https://docs.internetcomputer.org/building-apps/security/iam#do-not-rely-on-ingress-message-inspection). -2. **Forgetting to reject the anonymous principal.** Every endpoint that requires authentication must check that the caller is not the anonymous principal (`2vxsx-fae`). In Motoko use `Principal.isAnonymous(caller)`, in Rust compare `caller() != Principal::anonymous()`. Without this, unauthenticated callers can invoke protected methods — and if the canister uses the caller principal as an identity key (e.g., for balances), the anonymous principal becomes a shared identity anyone can use. +2. **Forgetting to reject the anonymous principal.** Every endpoint that requires authentication must check that the caller is not the anonymous principal (`2vxsx-fae`). In Motoko use `Principal.isAnonymous(caller)`, in Rust compare `msg_caller() != Principal::anonymous()`. Without this, unauthenticated callers can invoke protected methods — and if the canister uses the caller principal as an identity key (e.g., for balances), the anonymous principal becomes a shared identity anyone can use. 3. **Reading state before an async call and assuming it's unchanged after (TOCTOU).** When your canister `await`s an inter-canister call, other messages can interleave and mutate state. This is one of the most critical sources of DeFi exploits on IC. Always deduct/lock state before the `await`, then compensate on failure (saga pattern). Consider per-caller or per-resource locks to prevent duplicate concurrent operations. See [security best practices: inter-canister calls](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls). @@ -123,7 +123,7 @@ persistent actor { /// Only the owner public shared ({ caller }) func addAdmin(newAdmin : Principal) : async () { requireOwner(caller); - admins := Array.append(admins, [newAdmin]); + admins := Array.concat(admins, [newAdmin]); }; }; ``` @@ -131,6 +131,8 @@ persistent actor { #### Async safety (saga pattern) ```motoko +// Requires additional import: import Error "mo:core/Error" + public shared ({ caller }) func transfer(to : Principal, amount : Nat) : async () { // CAPTURE caller and validate BEFORE any await let sender = caller; @@ -181,7 +183,8 @@ system func inspect( #### Access control pattern ```rust -use ic_cdk::{caller, init, update}; +use ic_cdk::{init, update}; +use ic_cdk::api::msg_caller; use candid::Principal; use std::cell::RefCell; @@ -193,7 +196,7 @@ thread_local! { // --- Guards --- fn require_authenticated() -> Principal { - let caller = caller(); + let caller = msg_caller(); assert_ne!(caller, Principal::anonymous(), "anonymous caller not allowed"); caller } @@ -247,18 +250,19 @@ ic_cdk::export_candid!(); #### inspect_message (cycle drain reduction only) ```rust -use ic_cdk::api::call::{accept_message, method_name}; +use ic_cdk::api::{accept_message, msg_caller, msg_method_name}; +use candid::Principal; /// Pre-filter to reduce cycle waste from spam. /// Runs on ONE node. Can be bypassed. NOT a security check. /// Always duplicate real access control inside each method. #[ic_cdk::inspect_message] fn inspect_message() { - let method = method_name(); + let method = msg_method_name(); match method.as_str() { // Admin methods: only accept from non-anonymous callers "admin_action" | "add_admin" => { - if caller() != Principal::anonymous() { + if msg_caller() != Principal::anonymous() { accept_message(); } // Silently reject anonymous — saves cycles on Candid decoding @@ -294,19 +298,12 @@ async fn transfer(to: Principal, amount: u64) { } ``` -## Deploy & Test - -### Local Development - -```bash -icp network start -d -icp deploy backend -``` +## Verify It Works -### Test Access Control +### Access Control ```bash -# 1. Authenticated call should succeed (default identity is owner) +# 1. Authenticated call should succeed icp canister call backend public_action '()' # Expected: ("ok") @@ -336,28 +333,3 @@ icp canister settings show backend -e ic # 3. Verify cycles balance is healthy — well above freezing threshold reserve ``` - -## Verify It Works - -```bash -# 1. Anonymous rejection works -icp canister call backend public_action '()' --identity anonymous -# Expected: Error — "anonymous caller not allowed" - -# 2. Authenticated call succeeds -icp canister call backend public_action '()' -# Expected: ("ok") - -# 3. Admin-only method rejects non-admin -icp identity new test-user -icp canister call backend admin_action '()' --identity test-user -# Expected: Error — "caller is not an admin" - -# 4. Owner can add admin -icp canister call backend add_admin '(principal "aaaaa-aa")' -# Expected: () - -# 5. Freezing threshold is configured (mainnet) -icp canister settings show backend -e ic -# Expected: Freezing threshold >= 2,592,000 -``` From 1c050e9541d141cf4b66bf01df8adcbc25882675 Mon Sep 17 00:00:00 2001 From: Marco Walz Date: Tue, 3 Mar 2026 16:18:39 +0100 Subject: [PATCH 3/4] fix: address review feedback for canister-security skill Address all 9 review comments from venkkatesh-sekar: - Use persistent actor instead of stable keyword (Motoko) - Add init() method with race-window warning - Add removeAdmin for admin revocation - Rewrite Rust access control to use CDK guard pattern - Move compensation into finally/Drop cleanup context - Add call_on_cleanup mention for callback traps - Use replica-signed queries terminology Additional improvements: - Rewrite description and intro for agent consumption - Remove external links (agents cannot follow URLs) - Replace How It Works point 4 with state-rollback-on-trap - Add CallerGuard (per-caller locking) for both languages - Add CompensationGuard with Drop for trap-safe rollback - Use Call::bounded_wait with correct ic-cdk 0.19 API - Remove Verify It Works section (not actionable by agents) - Add placeholder comments for application-specific functions --- skills/canister-security/SKILL.md | 377 ++++++++++++++++++++---------- 1 file changed, 248 insertions(+), 129 deletions(-) diff --git a/skills/canister-security/SKILL.md b/skills/canister-security/SKILL.md index b13304d..2214af8 100644 --- a/skills/canister-security/SKILL.md +++ b/skills/canister-security/SKILL.md @@ -1,6 +1,6 @@ --- name: canister-security -description: "Secures IC canisters against IC-specific attack patterns. Covers access control, caller validation, cycles management, async state vulnerabilities (TOCTOU/reentrancy), controller safety, upgrade traps, and storage exhaustion. Use when implementing any canister that handles user data, tokens, or access control. Do NOT use for general canister programming patterns or deployment workflows." +description: "IC-specific security patterns for canister development in Motoko and Rust. Covers access control, anonymous principal rejection, reentrancy prevention (CallerGuard pattern), async safety (saga pattern), callback trap handling, cycle drain protection, and safe upgrade patterns. Use when writing or modifying any canister that modifies state, handles tokens, makes inter-canister calls, or implements access control." license: Apache-2.0 metadata: title: Canister Security @@ -11,7 +11,7 @@ metadata: ## What This Is -IC canisters face security challenges that don't exist in traditional web development. The async messaging model creates TOCTOU (time-of-check-time-of-use) vulnerabilities, `canister_inspect_message` is NOT a reliable security boundary, and anyone on the internet can burn your cycles by sending update calls. This skill covers the IC-specific patterns that prevent exploitable canisters. +Security patterns for IC canisters in Motoko and Rust. The async messaging model creates TOCTOU (time-of-check-time-of-use) vulnerabilities where state changes between `await` calls. `canister_inspect_message` is NOT a reliable security boundary. Anyone on the internet can burn your cycles by sending update calls. This skill provides copy-paste correct patterns for access control, reentrancy prevention, async safety, and callback trap handling. ## Prerequisites @@ -20,13 +20,13 @@ IC canisters face security challenges that don't exist in traditional web develo ## Security Pitfalls -1. **Relying on `canister_inspect_message` for access control.** This hook runs on a single replica without full consensus. A malicious boundary node can bypass it by forwarding the message anyway. It is also never called for inter-canister calls, query calls, or management canister calls. Always duplicate access checks inside every update method. Use `inspect_message` only as a cycle-saving optimization, never as a security boundary. See [security best practices: ingress message inspection](https://docs.internetcomputer.org/building-apps/security/iam#do-not-rely-on-ingress-message-inspection). +1. **Relying on `canister_inspect_message` for access control.** This hook runs on a single replica without full consensus. A malicious boundary node can bypass it by forwarding the message anyway. It is also never called for inter-canister calls, query calls, or management canister calls. Always duplicate access checks inside every update method. Use `inspect_message` only as a cycle-saving optimization, never as a security boundary. 2. **Forgetting to reject the anonymous principal.** Every endpoint that requires authentication must check that the caller is not the anonymous principal (`2vxsx-fae`). In Motoko use `Principal.isAnonymous(caller)`, in Rust compare `msg_caller() != Principal::anonymous()`. Without this, unauthenticated callers can invoke protected methods — and if the canister uses the caller principal as an identity key (e.g., for balances), the anonymous principal becomes a shared identity anyone can use. -3. **Reading state before an async call and assuming it's unchanged after (TOCTOU).** When your canister `await`s an inter-canister call, other messages can interleave and mutate state. This is one of the most critical sources of DeFi exploits on IC. Always deduct/lock state before the `await`, then compensate on failure (saga pattern). Consider per-caller or per-resource locks to prevent duplicate concurrent operations. See [security best practices: inter-canister calls](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls). +3. **Reading state before an async call and assuming it's unchanged after (TOCTOU).** When your canister `await`s an inter-canister call, other messages can interleave and mutate state. This is one of the most critical sources of DeFi exploits on IC. Use per-caller locking (CallerGuard pattern) to prevent concurrent operations, and deduct/lock state before the `await` (saga pattern). See the implementation section below for the combined pattern. -4. **Trapping in `pre_upgrade`.** If `pre_upgrade` traps (e.g., serializing too much data exceeds the instruction limit), the canister becomes permanently non-upgradeable. This is arguably the most devastating IC-specific vulnerability. Avoid storing large data structures in the heap that must be serialized during upgrade. In Rust, use `ic-stable-structures` for direct stable memory access. In Motoko, use the `stable` keyword for persistent variables which are handled automatically. +4. **Trapping in `pre_upgrade`.** If `pre_upgrade` traps (e.g., serializing too much data exceeds the instruction limit), the canister becomes permanently non-upgradeable. Avoid storing large data structures in the heap that must be serialized during upgrade. In Rust, use `ic-stable-structures` for direct stable memory access. In Motoko, the `persistent actor` declaration stores all `let` and `var` variables automatically in stable memory — no manual serialization needed. 5. **Not monitoring cycles balance.** Every canister has a default `freezing_threshold` of 2,592,000 seconds (~30 days). When cycles drop below the threshold reserve, the canister freezes (rejects all update calls). When cycles reach zero, the canister is uninstalled — its code and memory are removed, though the canister ID and controllers survive. The real pitfall is not actively monitoring and topping up cycles. For production canisters holding valuable state, increase the freezing threshold and set up automated monitoring. ```bash @@ -44,30 +44,30 @@ IC canisters face security challenges that don't exist in traditional web develo 7. **Calling `fetchRootKey()` in production.** `fetchRootKey()` fetches the root public key from the replica and trusts whatever it returns. On mainnet, the root key is hardcoded into the agent — calling `fetchRootKey()` there allows a man-in-the-middle to substitute a different key, breaking all verification. Only call `fetchRootKey()` in local development, guarded by an environment check. For frontends served by asset canisters, the root key is provided automatically. -8. **Exposing admin methods without guards.** Every update method is callable by anyone on the internet. Admin methods (migration, config, minting) must explicitly check the caller against an allowlist. There is no built-in role system — you must implement it yourself. +8. **Exposing admin methods without guards.** Every update method is callable by anyone on the internet. Admin methods (migration, config, minting) must explicitly check the caller against an allowlist. There is no built-in role system — you must implement it yourself. Always include admin revocation — missing revocation is a common source of bugs. 9. **Storing secrets in canister state.** Canister memory on standard application subnets is readable by node operators. Never store private keys, API secrets, or passwords in canister state. For on-chain secret management, use vetKD (threshold key derivation). 10. **Allowing unbounded user-controlled storage.** If users can store data without limits, an attacker can fill the 4 GiB Wasm heap or stable memory, bricking the canister. Always enforce per-user storage quotas and validate input sizes. -11. **Trapping in a callback after state mutation.** If your canister mutates state, then makes an inter-canister call, and the callback traps, the state mutations from before the call persist but the callback's mutations are rolled back. A malicious callee can exploit this to skip security-critical actions like debiting an account. Structure code so that critical state mutations happen either entirely before or entirely after the async boundary. Consider using `call_on_cleanup` for rollback logic and journaling for crash-safe state transitions. See [security best practices: callback traps](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls#securely-handle-traps-in-callbacks). +11. **Trapping in a callback after state mutation.** If your canister mutates state before an inter-canister call and the callback traps, the pre-call mutations persist but the callback's mutations are rolled back. A malicious callee can exploit this to skip security-critical actions like debiting an account. Structure code so that critical state mutations happen before the async boundary and are correctly rolled back if a failure or trap occurs. Use `try/finally` (Motoko) or `Drop` guards (Rust) to ensure cleanup always runs. Keep cleanup code minimal — trapping in cleanup recreates the problem. Consider using `call_on_cleanup` for rollback logic and journaling for crash-safe state transitions. -12. **Unbounded wait calls preventing upgrades.** If your canister makes a call to an untrustworthy or buggy callee that never responds, the canister cannot be stopped (and therefore cannot be upgraded) while awaiting outstanding responses. Use bounded wait calls (timeouts) to ensure calls complete in bounded time regardless of callee behavior. See [security best practices: untrustworthy canisters](https://docs.internetcomputer.org/building-apps/security/inter-canister-calls#be-aware-of-the-risks-involved-in-calling-untrustworthy-canisters). +12. **Unbounded wait calls preventing upgrades.** If your canister makes a call to an untrustworthy or buggy callee that never responds, the canister cannot be stopped (and therefore cannot be upgraded) while awaiting outstanding responses. Use bounded wait calls (timeouts) to ensure calls complete in bounded time regardless of callee behavior. ## How It Works ### IC Security Model 1. **Update calls** go through consensus — all nodes on a subnet execute the code and must agree on the result. Standard application subnets have 13 nodes; system and fiduciary subnets have more (28+). This makes update calls tamper-proof but slower (~2s). -2. **Query calls** run on a single replica — fast (~200ms) but the replica can return incorrect results. Replica-signed queries provide partial mitigation (the responding replica signs the response), but for full trust, use certified data or update calls for security-critical reads. +2. **Query calls** run on a single replica — fast (~200ms) but the replica can return incorrect or malicious results. Replica-signed queries provide partial mitigation (the responding replica signs the response), but for full trust, use certified data or update calls for security-critical reads. 3. **Inter-canister calls** are async messages. Between sending a request and receiving the response, your canister can process other messages. State may change under you (see TOCTOU pitfall above). -4. **`canister_inspect_message`** runs before Candid decoding on a single node. It reduces cycle waste from spam but is not a security check — a malicious boundary node can bypass it. +4. **State rollback on trap.** If a message execution traps, all its state changes are rolled back. For inter-canister calls, the first execution (before `await`) and the callback (after `await`) are separate messages — a trap in the callback rolls back only the callback's changes, while the first execution's changes persist. This is why compensation logic must go in cleanup context (`finally`/`Drop`), not regular callback code. ## Implementation ### Motoko -#### Access control pattern +#### Access control ```motoko import Principal "mo:core/Principal"; @@ -77,8 +77,22 @@ import Runtime "mo:core/Runtime"; persistent actor { // --- Authorization state --- - stable var owner : Principal = Principal.fromText("aaaaa-aa"); // replaced during init - stable var admins : [Principal] = []; + // persistent actor auto-persists all var/let — no `stable` keyword needed + var owner : Principal = Principal.fromText("aaaaa-aa"); // overwritten by init + var admins : [Principal] = []; + var initialized : Bool = false; + + // --- Init (call once immediately after deploy) --- + // WARNING: Between deploy and init(), any authenticated caller can front-run this and become owner. + // Always deploy and call init() in the same script/transaction. Unlike Rust's #[init] which runs + // atomically during install_code, Motoko requires a separate post-deploy call. + + public shared ({ caller }) func init() : async () { + if (initialized) { Runtime.trap("already initialized") }; + if (Principal.isAnonymous(caller)) { Runtime.trap("anonymous caller not allowed") }; + owner := caller; + initialized := true; + }; // --- Guards --- @@ -106,65 +120,112 @@ persistent actor { }; }; - // --- Public endpoints --- + // --- Admin management --- + + public shared ({ caller }) func addAdmin(newAdmin : Principal) : async () { + requireOwner(caller); + admins := Array.append(admins, [newAdmin]); + }; + + public shared ({ caller }) func removeAdmin(admin : Principal) : async () { + requireOwner(caller); + admins := Array.filter(admins, func(a : Principal) : Bool { a != admin }); + }; + + // --- Endpoints --- - /// Anyone authenticated can call this public shared ({ caller }) func publicAction() : async Text { requireAuthenticated(caller); "ok"; }; - /// Only admins / owner public shared ({ caller }) func adminAction() : async () { requireAdmin(caller); // ... protected logic }; - - /// Only the owner - public shared ({ caller }) func addAdmin(newAdmin : Principal) : async () { - requireOwner(caller); - admins := Array.concat(admins, [newAdmin]); - }; }; ``` -#### Async safety (saga pattern) +#### Reentrancy prevention + async safety (CallerGuard + saga pattern) + +Per-caller locking prevents a second call from the same caller while the first is awaiting a response. The saga pattern (deduct before `await`, compensate on failure) prevents TOCTOU exploits. Both lock release and balance compensation must go in the `finally` block — putting compensation only in `catch` is not enough, because callback state changes are rolled back on trap while `finally` runs in the cleanup context where state changes persist. ```motoko -// Requires additional import: import Error "mo:core/Error" - -public shared ({ caller }) func transfer(to : Principal, amount : Nat) : async () { - // CAPTURE caller and validate BEFORE any await - let sender = caller; - requireAuthenticated(sender); - - // Validate and DEDUCT FIRST, then make the external call - let balance = getBalance(sender); - if (balance < amount) { Runtime.trap("insufficient balance") }; - deductBalance(sender, amount); - - try { - await otherCanister.deposit(to, amount); - } catch (e) { - // Compensate: re-credit on failure - creditBalance(sender, amount); - Runtime.trap("transfer failed: " # Error.message(e)); +import Map "mo:core/Map"; +import Principal "mo:core/Principal"; +import Error "mo:core/Error"; +import Result "mo:core/Result"; +import Runtime "mo:core/Runtime"; + +// Inside persistent actor { ... } +// getBalance, deductBalance, creditBalance, otherCanister are application-specific — replace with your state management. + + let pendingRequests = Map.empty(); + + func acquireGuard(principal : Principal) : Result.Result<(), Text> { + if (Map.get(pendingRequests, Principal.compare, principal) != null) { + return #err("already processing a request for this caller"); + }; + Map.add(pendingRequests, Principal.compare, principal, true); + #ok; + }; + + func releaseGuard(principal : Principal) { + ignore Map.delete(pendingRequests, Principal.compare, principal); + }; + + public shared ({ caller }) func transfer(to : Principal, amount : Nat) : async Result.Result<(), Text> { + requireAuthenticated(caller); + + // 1. Acquire per-caller lock — rejects concurrent calls from same principal + switch (acquireGuard(caller)) { + case (#err(msg)) { return #err(msg) }; + case (#ok) {}; + }; + + // 2. Validate and DEDUCT BEFORE the await (saga: debit first) + let balance = getBalance(caller); + if (balance < amount) { + releaseGuard(caller); + return #err("insufficient balance"); + }; + deductBalance(caller, amount); + + // 3. Make inter-canister call — track success for compensation + var succeeded = false; + try { + await otherCanister.deposit(to, amount); + succeeded := true; + #ok + } catch (e) { + #err("transfer failed: " # Error.message(e)) + } finally { + // Runs in cleanup context even if the callback traps — changes here persist. + // Without this, a trap rolls back catch-block compensation, losing user funds. + if (not succeeded) { + creditBalance(caller, amount); + }; + releaseGuard(caller); + }; }; -}; ``` -#### inspect_message (cycle drain reduction only) +#### inspect_message (cycle optimization only) ```motoko // Inside persistent actor { ... } +// Method variants must match your public methods system func inspect( { caller : Principal; msg : { + #init : () -> (); #adminAction : () -> (); #addAdmin : () -> Principal; + #removeAdmin : () -> Principal; #publicAction : () -> (); + #transfer : () -> (Principal, Nat); } } ) : Bool { @@ -172,6 +233,8 @@ system func inspect( // Admin methods: reject anonymous to save cycles on Candid decoding case (#adminAction _) { not Principal.isAnonymous(caller) }; case (#addAdmin _) { not Principal.isAnonymous(caller) }; + case (#removeAdmin _) { not Principal.isAnonymous(caller) }; + case (#transfer _) { not Principal.isAnonymous(caller) }; // Public methods: accept all case (_) { true }; }; @@ -180,7 +243,9 @@ system func inspect( ### Rust -#### Access control pattern +#### Access control (using CDK guard pattern) + +The `guard` attribute runs a check before the method body. If the guard returns `Err`, the call is rejected before any method code executes. This is more robust than calling guard functions inside the method — you cannot forget to add it. ```rust use ic_cdk::{init, update}; @@ -193,28 +258,35 @@ thread_local! { static ADMINS: RefCell> = RefCell::new(vec![]); } -// --- Guards --- +// --- Guards (for #[update(guard = "...")] attribute) --- +// Must return Result<(), String>. Err rejects the call. -fn require_authenticated() -> Principal { - let caller = msg_caller(); - assert_ne!(caller, Principal::anonymous(), "anonymous caller not allowed"); - caller +fn require_authenticated() -> Result<(), String> { + if msg_caller() == Principal::anonymous() { + return Err("anonymous caller not allowed".to_string()); + } + Ok(()) } -fn require_owner() -> Principal { - let caller = require_authenticated(); +fn require_owner() -> Result<(), String> { + require_authenticated()?; OWNER.with(|o| { - assert_eq!(caller, *o.borrow(), "caller is not the owner"); - }); - caller + if msg_caller() != *o.borrow() { + return Err("caller is not the owner".to_string()); + } + Ok(()) + }) } -fn require_admin() -> Principal { - let caller = require_authenticated(); +fn require_admin() -> Result<(), String> { + require_authenticated()?; + let caller = msg_caller(); let is_authorized = OWNER.with(|o| caller == *o.borrow()) || ADMINS.with(|a| a.borrow().contains(&caller)); - assert!(is_authorized, "caller is not an admin"); - caller + if !is_authorized { + return Err("caller is not an admin".to_string()); + } + Ok(()) } // --- Init --- @@ -226,28 +298,136 @@ fn init(owner: Principal) { // --- Endpoints --- -#[update] +#[update(guard = "require_authenticated")] fn public_action() -> String { - require_authenticated(); "ok".to_string() } -#[update] +#[update(guard = "require_admin")] fn admin_action() { - require_admin(); - // ... protected logic + // ... protected logic — guard already validated caller } -#[update] +#[update(guard = "require_owner")] fn add_admin(new_admin: Principal) { - require_owner(); ADMINS.with(|a| a.borrow_mut().push(new_admin)); } +#[update(guard = "require_owner")] +fn remove_admin(admin: Principal) { + ADMINS.with(|a| a.borrow_mut().retain(|p| p != &admin)); +} + ic_cdk::export_candid!(); ``` -#### inspect_message (cycle drain reduction only) +#### Reentrancy prevention + async safety (CallerGuard + saga pattern) + +`CallerGuard` uses the `Drop` trait to release the lock when the guard goes out of scope — including when the callback traps (since ic-cdk 0.5.1, local variables go out of scope during cleanup). `CompensationGuard` applies the same `Drop` pattern to balance rollback — putting compensation only in the `Err` match arm is not enough, because callback state changes are rolled back on trap. Never use `let _ = CallerGuard::new(caller)?` — this drops the guard immediately, making locking ineffective. + +```rust +use std::cell::RefCell; +use std::collections::BTreeSet; +use candid::Principal; +use ic_cdk::update; +use ic_cdk::api::msg_caller; +use ic_cdk::call::Call; + +// get_balance, deduct_balance, credit_balance, ledger_id are application-specific — replace with your state management. + +thread_local! { + static PENDING: RefCell> = RefCell::new(BTreeSet::new()); +} + +struct CallerGuard { + principal: Principal, +} + +impl CallerGuard { + fn new(principal: Principal) -> Result { + PENDING.with(|p| { + if !p.borrow_mut().insert(principal) { + return Err("already processing a request for this caller".to_string()); + } + Ok(Self { principal }) + }) + } +} + +impl Drop for CallerGuard { + fn drop(&mut self) { + PENDING.with(|p| { + p.borrow_mut().remove(&self.principal); + }); + } +} + +/// Reverses a balance deduction on drop unless explicitly completed. +/// Drop runs during cleanup even if the callback traps, so compensation persists. +struct CompensationGuard { + principal: Principal, + amount: u64, + completed: bool, +} + +impl CompensationGuard { + fn new(principal: Principal, amount: u64) -> Self { + Self { principal, amount, completed: false } + } + /// Call after successful transfer to prevent compensation on drop. + fn complete(&mut self) { + self.completed = true; + } +} + +impl Drop for CompensationGuard { + fn drop(&mut self) { + if !self.completed { + credit_balance(&self.principal, self.amount); + } + } +} + +#[update] +async fn transfer(to: Principal, amount: u64) -> Result<(), String> { + let caller = msg_caller(); + if caller == Principal::anonymous() { + return Err("anonymous caller not allowed".to_string()); + } + + // 1. Acquire per-caller lock — rejects concurrent calls from same principal + // Drop releases lock even if callback traps + let _guard = CallerGuard::new(caller)?; + + // 2. Validate and DEDUCT BEFORE the await (saga: debit first) + let balance = get_balance(&caller); + if balance < amount { + return Err("insufficient balance".to_string()); + } + deduct_balance(&caller, amount); + + // 3. Compensation guard — credits back on drop unless completed. + // Drop runs during cleanup even if callback traps, so compensation persists. + // Putting credit_balance only in the Err arm would be rolled back on trap. + let mut compensation = CompensationGuard::new(caller, amount); + + // 4. Make inter-canister call + match Call::bounded_wait(ledger_id(), "transfer") + .with_args(&(to, amount)) + .await + { + Ok(_response) => { + compensation.complete(); // transfer succeeded — don't compensate + Ok(()) + } + Err(err) => Err(format!("call failed: {:?}", err)), + // compensation dropped here → credits back the amount + } + // _guard dropped here → lock released +} +``` + +#### inspect_message (cycle optimization only) ```rust use ic_cdk::api::{accept_message, msg_caller, msg_method_name}; @@ -255,13 +435,13 @@ use candid::Principal; /// Pre-filter to reduce cycle waste from spam. /// Runs on ONE node. Can be bypassed. NOT a security check. -/// Always duplicate real access control inside each method. +/// Always duplicate real access control inside each method or via guard attribute. #[ic_cdk::inspect_message] fn inspect_message() { let method = msg_method_name(); match method.as_str() { // Admin methods: only accept from non-anonymous callers - "admin_action" | "add_admin" => { + "admin_action" | "add_admin" | "remove_admin" | "transfer" => { if msg_caller() != Principal::anonymous() { accept_message(); } @@ -272,64 +452,3 @@ fn inspect_message() { } } ``` - -#### Async safety (saga pattern) - -```rust -#[update] -async fn transfer(to: Principal, amount: u64) { - // CAPTURE caller and validate BEFORE any .await - let sender = require_authenticated(); - - // Validate and DEDUCT FIRST, then make the async call - let balance = get_balance(&sender); - assert!(balance >= amount, "insufficient balance"); - deduct_balance(&sender, amount); - - // Make inter-canister call - match other_canister_deposit(to, amount).await { - Ok(()) => { /* success */ } - Err(err) => { - // Compensate: re-credit on failure - credit_balance(&sender, amount); - ic_cdk::trap(&format!("transfer failed: {:?}", err)); - } - } -} -``` - -## Verify It Works - -### Access Control - -```bash -# 1. Authenticated call should succeed -icp canister call backend public_action '()' -# Expected: ("ok") - -# 2. Anonymous call should fail -icp canister call backend public_action '()' --identity anonymous -# Expected: Error containing "anonymous caller not allowed" - -# 3. Admin call from owner should succeed -icp canister call backend admin_action '()' -# Expected: () - -# 4. Admin call from non-admin should fail -icp identity new attacker -icp canister call backend admin_action '()' --identity attacker -# Expected: Error containing "caller is not an admin" -``` - -### Mainnet Security Checklist - -```bash -# 1. Verify controllers include a backup -icp canister settings show backend -e ic -# Expected: at least 2 controllers (your identity + backup) - -# 2. Verify freezing threshold is adequate -# Default: 2,592,000 seconds (30 days). Increase for high-value canisters. - -# 3. Verify cycles balance is healthy — well above freezing threshold reserve -``` From eee879f1c87be1df1c951ae33369a5b4a4da0642 Mon Sep 17 00:00:00 2001 From: Marco Walz Date: Tue, 3 Mar 2026 16:46:31 +0100 Subject: [PATCH 4/4] fix: replace SNS recommendation with agent-actionable backup controller advice --- skills/canister-security/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/canister-security/SKILL.md b/skills/canister-security/SKILL.md index 2214af8..8a66221 100644 --- a/skills/canister-security/SKILL.md +++ b/skills/canister-security/SKILL.md @@ -40,7 +40,7 @@ Security patterns for IC canisters in Motoko and Rust. The async messaging model ```bash icp canister settings update backend --add-controller -e ic ``` - For high-value canisters, consider making an SNS or the canister itself a controller so governance can manage upgrades. + When deploying, ask the developer if they have a backup controller principal to add. 7. **Calling `fetchRootKey()` in production.** `fetchRootKey()` fetches the root public key from the replica and trusts whatever it returns. On mainnet, the root key is hardcoded into the agent — calling `fetchRootKey()` there allows a man-in-the-middle to substitute a different key, breaking all verification. Only call `fetchRootKey()` in local development, guarded by an environment check. For frontends served by asset canisters, the root key is provided automatically.