Skip to content

Comments

feat: Introduce Merkle Tree for Transaction Verification (SPV Support)#28

Open
Muneerali199 wants to merge 5 commits intoStabilityNexus:mainfrom
Muneerali199:feature/merkle-tree-spv-support
Open

feat: Introduce Merkle Tree for Transaction Verification (SPV Support)#28
Muneerali199 wants to merge 5 commits intoStabilityNexus:mainfrom
Muneerali199:feature/merkle-tree-spv-support

Conversation

@Muneerali199
Copy link

@Muneerali199 Muneerali199 commented Feb 22, 2026

close #27

🔎 Summary

This PR introduces Merkle Tree–based transaction verification to enable Simplified Payment Verification (SPV) support.

Changes Included

  • Implemented a MerkleTree utility class with:
    • Merkle root computation
    • Proof generation
    • Proof verification
  • Updated the Block class to:
    • Use MerkleTree
    • Store merkle_root instead of hashing raw transaction lists
  • Added a FastAPI server exposing SPV verification endpoints

🌐 API Endpoints Added

  • GET /verify_transaction?tx_hash=<>&block_index=<>
    Returns transaction inclusion proof, Merkle path, and verification status.

  • GET /block/<index>
    Returns block data including Merkle verification support.

  • GET /chain
    Returns the full blockchain.


🧪 Testing

  • ✅ All 19 tests passing
    • 8 new Merkle-related tests
    • 11 existing tests remain unaffected
  • Verified:
    • Root consistency
    • Proof correctness
    • Cross-verification integrity

This enhancement aligns the project closer to real-world blockchain architecture and enables light-client style verification.

Summary by CodeRabbit

  • New Features
    • REST API for chain info, block retrieval (per-transaction Merkle proofs), transaction verification, and triggering mining.
    • Merkle-tree support for blocks with per-tx hashes and proof generation/verification.
    • Persistent chain storage (load/save) and state serialization for shutdown persistence.
    • Mining workflow extracted to a dedicated module with nonce syncing and miner reward handling.
  • Tests
    • Added comprehensive Merkle tree unit tests.
  • Chores
    • Added FastAPI, Uvicorn, and Pydantic; introduced internal SHA-256 helper.

- Implement MerkleTree utility class in core/merkle.py
- Add proof generation and verification methods
- Update Block class to use MerkleTree for merkle_root
- Add FastAPI server with /verify_transaction endpoint
- Add /block/<index> endpoint with Merkle proofs for each transaction
- Add tests for Merkle tree functionality
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Warning

Rate limit exceeded

@Muneerali199 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 18 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds a Merkle tree utility and integrates SPV support into blocks, persists chain and state to disk, moves mining logic into a core mining module, and exposes a FastAPI app with endpoints for chain info, per-transaction Merkle proofs, transaction verification, and mining.

Changes

Cohort / File(s) Summary
Merkle utility & tests
core/merkle.py, tests/test_merkle.py
New MerkleTree class and calculate_merkle_root(); proof generation via get_proof, proof verification via verify_proof, plus comprehensive unit tests covering roots, proofs, verification, and edge cases.
Block merkle integration
core/block.py
Block now constructs an internal _merkle_tree, sets merkle_root, and exposes get_merkle_proof(tx_index) and get_tx_hash(tx_index); removed prior manual merkle/hash code.
Chain persistence & state
core/chain.py, core/state.py
Blockchain loads/saves chain to DEFAULT_CHAIN_FILE, reconstructs blocks/transactions on load, calls save_to_file() after add_block(), exposes last_block; State adds to_dict()/from_dict() for serialization.
Mining workflow extraction
core/mining.py, main.py
New mining helper mine_and_process_block and sync_nonce in core/mining.py; main.py removed in-file mining logic and now imports these functions.
API & runtime deps
api/main.py, requirements.txt
New FastAPI app (app) with endpoints: root, /chain, /block/{block_index} (per-tx Merkle proofs), /verify_transaction, and /mine; adds fastapi, uvicorn, pydantic to requirements.
Utility helpers
core/utils.py
Added internal _sha256(data: str) helper used by Merkle/tree and block hashing.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant API as FastAPI (api/main.py)
    participant Chain as Blockchain (core/chain.py)
    participant Block as Block (core/block.py)
    participant Merkle as MerkleTree (core/merkle.py)

    Client->>API: GET /verify_transaction?tx_hash=<h>&block_index=<i>
    API->>Chain: fetch block by index <i>
    Chain-->>API: Block (includes merkle_root, txs)
    API->>Block: find tx_index for tx_hash
    API->>Block: get_merkle_proof(tx_index)
    Block->>Merkle: get_proof(tx_index)
    Merkle-->>Block: proof list
    API->>Merkle: verify_proof(tx_hash, proof, merkle_root)
    Merkle-->>API: verification result
    API-->>Client: VerifyTransactionResponse (valid, proof, merkle_root)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Core contract base setup #7 — Related prior work introducing core block/chain foundations that this PR extends (merkle/block changes integrate with that core structure).

Suggested labels

Python Lang

Suggested reviewers

  • Zahnentferner

Poem

🐰 I hopped through leaves of hash and root,

I stitched the branches, firm and cute,
With proofs in paw and packets bright,
Light clients wink at cryptic light,
A joyful hop — SPV delight!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing Merkle Tree functionality for transaction verification with SPV support.
Linked Issues check ✅ Passed All core requirements from issue #27 are implemented: MerkleTree class, block header merkle_root storage, SPV API endpoints (/verify_transaction, /block, /chain), and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are aligned with #27 objectives. Additional enhancements (chain persistence, mining refactoring, State serialization) support the core SPV feature without introducing unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/main.py`:
- Line 12: The module-level blockchain singleton (variable blockchain created
from class Blockchain) stores state only in memory so the chain is lost on
restart; add persistence by implementing serialization/deserialization on the
Blockchain class (e.g., to_dict()/from_dict() or
save_to_file()/load_from_file()) to write the chain and pending transactions to
disk (JSON or SQLite) and restore them at startup, then modify the module to
call Blockchain.load_from_file(...) when constructing the blockchain instance
and ensure you save state at shutdown or after mutating operations (use atomic
file writes and error handling in methods like Blockchain.save_to_file and
Blockchain.load_from_file to avoid corruption).
- Around line 26-35: Update the BlockResponse Pydantic model so its hash field
allows None to match Block.hash: change BlockResponse.hash from str to
Optional[str] (and keep default None if present) to prevent validation errors
when a Block instance's hash is unset; ensure the type import for Optional is
used and adjust any code that assumes a non-null hash accordingly.
- Around line 150-152: The uvicorn startup in the if __name__ == "__main__"
block binds to all interfaces; update the uvicorn.run invocation (the
uvicorn.run(app, host="0.0.0.0", port=8000") call) to use host "127.0.0.1"
instead of "0.0.0.0" for local/dev runs so the server only listens on localhost;
modify that single argument in the uvicorn.run call accordingly.
- Around line 77-82: The current loop drops valid empty proofs because it uses a
truthiness check; change the condition so it only excludes missing proofs (None)
but keeps empty lists returned by block.get_merkle_proof. In the loop over
block.transactions (where merkle_proofs is built using compute_tx_hash and
block.get_merkle_proof), replace the `if proof:` check with an explicit `if
proof is not None` (or always assign merkle_proofs[tx_hash] = proof) so that
empty proof lists for single-transaction blocks are preserved.
- Around line 52-53: compute_tx_hash duplicates the canonical SHA-256-over-JSON
transaction hashing and must be removed; replace uses of compute_tx_hash with
the Block-provided implementation by calling block.get_tx_hash(i) (which wraps
core/merkle._hash_transactions) when building/serializing blocks (e.g., in
get_block) and delete the compute_tx_hash function to avoid divergent copies.
- Around line 119-122: The call to MerkleTree.verify_proof can receive a None
proof from block.get_merkle_proof which will raise a TypeError; before calling
MerkleTree.verify_proof, check the result of block.get_merkle_proof (variable
proof) and handle the None case (e.g., set verification_status to False or raise
a clear error) using the existing tx_hash and merkle_root values; update the
code paths that assign verification_status so they only call
MerkleTree.verify_proof when proof is not None and otherwise handle/report the
missing proof appropriately.
- Around line 134-147: The /mine endpoint currently creates a new Mempool()
inside mine_block_endpoint which results in empty blocks; change it to use a
shared module-level mempool instance (e.g., create mempool = Mempool() at module
scope) and pass that shared instance into mine_and_process_block instead of
instantiating one per request; update references to pending_nonce_map and
blockchain as needed so they use the same shared state passed into
mine_and_process_block/get_transactions_for_block; also move the deferred
imports for mine_and_process_block and Mempool to the top of the module (or add
ImportError handling) so ImportError is handled at startup rather than at
request time.

In `@core/block.py`:
- Around line 89-93: Replace the on-the-fly recomputation in get_tx_hash with
the precomputed value from the MerkleTree: check the tx_index bounds as
currently done, then return self._merkle_tree.tx_hashes[tx_index] (or None if
out of range) instead of rebuilding the transaction dict and calling _sha256;
keep the same return type Optional[str] and rely on the MerkleTree's tx_hashes
to encapsulate the hashing formula.

In `@core/merkle.py`:
- Around line 26-46: The leaf vs internal hashing lacks domain separation
causing second-preimage risks; modify _hash_transactions to prefix each
serialized transaction with a leaf tag (e.g., "leaf:") before calling _sha256,
update _build_tree to prefix concatenated child hashes with an internal/node tag
(e.g., "node:") before _sha256 (ensure when duplicating the last leaf you still
treat it as a leaf before promotion), and update verify_proof to use the same
"node:" prefix when recomputing parent hashes so verification uses the identical
domain-separated hashing scheme.
- Line 3: Remove the unused Tuple import and modernize type hints in
core/merkle.py by deleting Tuple from the from typing import line, replacing
List and Optional usages with built-in list and union syntax (e.g., list[int] or
MyType | None) throughout the file (update any function signatures, return
annotations, and variable annotations that reference List[...] or Optional[...]
accordingly); ensure no remaining imports from typing are unused and run linters
to confirm Ruff UP035 compliance.

In `@requirements.txt`:
- Line 3: Update the pinned FastAPI dependency in requirements.txt to at least
0.109.1 (preferably the latest stable e.g., 0.129.2) to remediate
CVE-2024-24762; edit the fastapi entry (fastapi==0.109.0) to the new version,
then regenerate your dependency lock (pip freeze / pip-tools) and rebuild/test
the app to ensure compatibility. While updating, also consider bumping uvicorn
to 0.41.0 and pydantic to 2.12.5, run the test suite and smoke tests, and update
any Docker images or CI caches that install these packages.

In `@tests/test_merkle.py`:
- Around line 70-82: Add a new unit test named
test_proof_verification_fails_wrong_tx_hash that mirrors
test_proof_verification_fails_wrong_root but supplies an invalid transaction
hash instead of an invalid root: construct a MerkleTree with the same sample
txs, obtain the valid root via MerkleTree.get_merkle_root() and a valid proof
via MerkleTree.get_proof(0), then call MerkleTree.verify_proof with a tampered
tx hash (e.g. a string of repeated chars) and assert the result is False;
reference the existing test_proof_verification_fails_wrong_root,
MerkleTree.get_proof, MerkleTree.get_merkle_root, and MerkleTree.verify_proof to
locate where to add the new test.
- Around line 10-15: Extend the test_single_transaction in tests/test_merkle.py
to assert the single-leaf proof round-trip: call MerkleTree.get_proof(0) and
assert it returns an empty list, then call MerkleTree.verify_proof(proof,
leaf_hash, root) (using the tx hash as leaf_hash and tree.get_merkle_root() as
root) and assert it returns True; ensure you compute the leaf hash the same way
MerkleTree uses for its leaves so the verification compares the correct values.

api/main.py Outdated

app = FastAPI(title="MiniChain API", description="SPV-enabled blockchain API")

blockchain = Blockchain()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Module-level blockchain singleton is ephemeral — chain state is lost on every server restart.

All mined blocks and transactions are held in memory only. For anything beyond a demo, consider persisting chain state to disk (JSON, SQLite, etc.) and restoring it at startup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/main.py` at line 12, The module-level blockchain singleton (variable
blockchain created from class Blockchain) stores state only in memory so the
chain is lost on restart; add persistence by implementing
serialization/deserialization on the Blockchain class (e.g.,
to_dict()/from_dict() or save_to_file()/load_from_file()) to write the chain and
pending transactions to disk (JSON or SQLite) and restore them at startup, then
modify the module to call Blockchain.load_from_file(...) when constructing the
blockchain instance and ensure you save state at shutdown or after mutating
operations (use atomic file writes and error handling in methods like
Blockchain.save_to_file and Blockchain.load_from_file to avoid corruption).

- Add blockchain persistence with save_to_file/load_from_file
- Fix BlockResponse.hash to Optional[str]
- Remove compute_tx_hash, use block.get_tx_hash()
- Fix empty proof check in get_block (use is not None)
- Handle None proof in verify_transaction
- Use shared mempool in /mine endpoint
- Change uvicorn host to 127.0.0.1
- Use precomputed tx_hashes from MerkleTree in get_tx_hash
- Modernize type hints in merkle.py (remove unused Tuple import)
- Add domain separation for leaf vs internal hashing
- Add single transaction proof round-trip test
- Add test_proof_verification_fails_wrong_tx_hash
- Update FastAPI to 0.129.2, uvicorn to 0.41.0, pydantic to 2.12.5
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/main.py`:
- Around line 70-89: get_block (and likewise get_chain) reads blockchain.chain
without acquiring blockchain._lock, risking concurrent mutation during
iteration; fix by acquiring the lock before accessing the chain (use `with
blockchain._lock:`), make a shallow copy like `chain_copy =
list(blockchain.chain)` inside the locked section, then release the lock and
operate on `chain_copy` when calling block.to_dict(), block.get_tx_hash(), and
block.get_merkle_proof() so you never iterate the live list while /mine may
append.
- Around line 15-23: TransactionResponse currently declares receiver: str which
rejects None for contract-deploy transactions; update the TransactionResponse
model (and any related BlockResponse serialization) to accept receiver:
Optional[str] so contract deployments can use None, and also relax data to
accept either dict or str (e.g., Optional[Union[dict, str]]) since
Transaction.__init__ may set data to a string (contract code); adjust
imports/types accordingly to keep Pydantic validation passing.
- Around line 146-158: The endpoint performs a deferred import of
mine_and_process_block inside mine_block_endpoint which delays failure to
request time; move the import out of the function and import
mine_and_process_block at module level (or refactor the mining logic into a new
core/mining.py and import from there) and update mine_block_endpoint to call the
module-level function directly; if you refactor into core/mining.py, ensure the
function signature accepts dependencies (pass in blockchain, mempool,
pending_nonce_map, and required globals like BURN_ADDRESS and a logger instead
of relying on main.py module scope) and update any calls to
mine_and_process_block accordingly.
- Around line 52-54: Replace the deprecated `@app.on_event`("shutdown") handler
with a FastAPI lifespan context manager: create an asynccontextmanager function
(e.g., lifespan) that yields for startup and calls blockchain.save_to_file() in
a finally block on exit, pass that lifespan to FastAPI(...) when constructing
the app, and remove the shutdown_event function; ensure you import
asynccontextmanager from contextlib and keep the call to
blockchain.save_to_file() inside the teardown section of the lifespan manager.

In `@core/block.py`:
- Around line 9-10: The helper function _sha256 is duplicated in core/block.py
and core/merkle.py; extract this single function into a new shared module (e.g.,
create core/utils.py with def _sha256(data: str) -> str) and remove the
duplicate definitions from both core/block.py and core/merkle.py, then update
both files to import _sha256 from core.utils (ensure the import name matches
existing uses and run tests/linting to confirm no unresolved references).

In `@core/chain.py`:
- Around line 28-58: After deserializing in _load_from_file, validate the
reconstructed chain before accepting it: for each Block in self.chain verify (1)
its previous_hash equals the hash of the prior block, (2) its stored hash
matches a recomputed hash (use the Block method that computes/serializes the
block hash — e.g. Block.compute_hash or equivalent), and (3) the merkle root
(recomputed from Transaction objects) matches whatever the Block stores (use
Block.merkle_root or recompute from Transaction list); on any mismatch log a
clear warning including self._chain_file and reject the loaded file (call
self._create_genesis_block() or raise) instead of silently accepting
corrupted/tampered data.
- Around line 86-89: The except block references temp_file which may not be
assigned if an exception occurs before its creation; fix by declaring temp_file
= None before the try (or assign temp_file earlier) and change the cleanup to
check temp_file is not None and os.path.exists(temp_file) before os.remove,
referencing the same temp_file and keeping the existing
logger.error(self._chain_file) call to avoid a secondary NameError in the
exception handler.
- Around line 38-49: When loading blocks in the chain loader loop, avoid
re-converting transaction timestamps by instantiating Transaction without
letting its constructor re-scale the timestamp and then directly assign the
already-millisecond value from the serialized dict; update the loop that builds
transactions (refer to Transaction and Block in the loader) so you create
Transaction with timestamp bypassed and then set t.timestamp = tx["timestamp"]
(or equivalent) before appending, ensuring merkle/hash consistency used by
add_block and to_dict; also fix save_to_file by declaring temp_file (e.g.,
temp_file = None) before the try block (or initialize it in the except) so the
except path can reference temp_file safely and not mask the original exception.

In `@core/merkle.py`:
- Around line 34-52: _build_tree currently pads the level in-place by mutating
current_level (which is a reference to tree[-1]), causing tree levels (e.g.,
tree[0]) to grow; change the code so you operate on a copy of the last level
before padding (e.g., current_level = tree[-1][:] or list(tree[-1])), then pad
that copy and use it to compute new_level with _sha256(self.NODE_PREFIX +
combined) so the stored tree lists remain unchanged and consistent with
self.tx_hashes (get_proof can still bound-check against self.tx_hashes).
- Around line 10-15: The MerkleProof dataclass is declared but unused; either
remove the MerkleProof class definition or make it the canonical return type for
your proof APIs—e.g., change functions that currently return raw dicts/lists
(such as any get_proof / build_proof / verify_proof utilities) to return a
MerkleProof instance with fields tx_hash, merkle_root, proof, and
verification_status; if you keep it, update the signatures and call sites to
construct and return MerkleProof objects, otherwise delete the MerkleProof
dataclass to eliminate dead code.

In `@tests/test_merkle.py`:
- Around line 6-8: Add a positive test that verifies calculate_merkle_root
returns the same value as constructing a MerkleTree and calling its
get_merkle_root for a non-empty list: create a new test method (e.g.,
test_calculate_merkle_root_matches_tree) that builds a sample transactions list,
computes root1 = calculate_merkle_root(transactions) and root2 =
MerkleTree(transactions).get_merkle_root(), and assert they are equal (use
self.assertEqual); ensure you import or reference MerkleTree and
calculate_merkle_root consistently with the existing tests.

core/block.py Outdated
Comment on lines 9 to 10
def _sha256(data: str) -> str:
return hashlib.sha256(data.encode()).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

_sha256 is duplicated across core/block.py and core/merkle.py.

Both files define an identical _sha256 helper. Consider extracting it into a shared utility module (e.g., core/utils.py) to maintain a single source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/block.py` around lines 9 - 10, The helper function _sha256 is duplicated
in core/block.py and core/merkle.py; extract this single function into a new
shared module (e.g., create core/utils.py with def _sha256(data: str) -> str)
and remove the duplicate definitions from both core/block.py and core/merkle.py,
then update both files to import _sha256 from core.utils (ensure the import name
matches existing uses and run tests/linting to confirm no unresolved
references).

- Extract _sha256 to shared core/utils.py
- Update block.py and merkle.py to import from utils
- Fix _build_tree to use copy of level (avoid mutation)
- Remove unused MerkleProof dataclass
- Fix temp_file declaration in chain.py save_to_file
- Add chain validation in _load_from_file (previous_hash, hash, merkle_root)
- Fix transaction timestamp handling in chain loader
- Move mining import to module level in api/main.py
- Add thread safety in get_block and get_chain (acquire lock)
- Replace @app.on_event with lifespan context manager
- Fix TransactionResponse for contract deploy (Optional receiver, Union data)
- Add test_calculate_merkle_root_matches_tree
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/main.py`:
- Around line 12-21: The module-level instantiation of Blockchain() and
Mempool() causes file I/O on import; move their creation into the lifespan
startup so initialization is explicit and lazy. In the asynccontextmanager
lifespan function, before yield create the instances (Blockchain() and
Mempool()), attach them to the FastAPI app (e.g., app.state.blockchain and
app.state.mempool) so handlers can access them, and keep the existing
blockchain.save_to_file() after yield to persist on shutdown. Update any code
that referenced the module-level symbols (blockchain, mempool) to read from
app.state instead.
- Line 84: Remove the unused local variable by deleting the assignment
"chain_length = len(blockchain.chain)" (or if the length is needed for
side-effects, replace usage by directly referencing len(blockchain.chain) where
required); locate the statement in the code block that references
blockchain.chain and remove that single unused assignment to satisfy Ruff F841.
- Around line 88-94: Rename the unused loop variable `tx` to `_` in the
for-loops that enumerate block.transactions (the loops using `for i, tx in
enumerate(block.transactions):` which call `block.get_tx_hash(i)` and
`block.get_merkle_proof(i)`) to satisfy the Ruff B007 rule; update both
occurrences (the one building `merkle_proofs` and the later similar loop) so the
pattern becomes `for i, _ in enumerate(block.transactions):`.
- Line 9: Extract the mine_and_process_block function out of main.py into a new
core.mining module and update callers to import it from there; move the function
body (including any helper functions it relies on) into core.mining, ensure its
public signature remains unchanged, and remove the import of
mine_and_process_block from main.py in the API module so api's main imports
core.mining.mine_and_process_block instead; verify and update any relative
imports inside the moved code to avoid circular imports and run a quick import
check to ensure both the application entry point and the API import the function
from core.mining.
- Around line 157-167: The pending_nonce_map should be a module-level variable
so its state persists across /mine requests; move the pending_nonce_map
declaration out of mine_block_endpoint to module scope (e.g., define
pending_nonce_map = {} near the top of api/main.py) and update
mine_block_endpoint to use that shared map instead of recreating it; also change
the call result handling to destructure the return from mine_and_process_block
as block, *_ = mine_and_process_block(blockchain, mempool, pending_nonce_map)
and then check block (instead of result[0]) before saving and returning the
block.

In `@core/chain.py`:
- Around line 38-62: The genesis block is not validated because the existing
validation loop starts at range(1, len(self.chain)); update the validation in
the class that builds/validates self.chain to explicitly check the genesis block
first: verify self.chain[0].hash equals a 64-char zero string and
self.chain[0].previous_hash equals "0" and ensure its transactions are as
expected (e.g., empty or conforming to your genesis rules) before proceeding to
the existing loop that validates subsequent blocks (refer to self.chain and the
Block/Transaction construction logic in the shown diff).
- Line 194: add_block currently holds self._lock while calling
self.save_to_file(), causing file I/O under the lock; change the flow so that
inside the critical section (while holding self._lock) you perform validation,
state commit, append the block and then create/serialize the minimal data
snapshot (e.g., a copy of the data dict or the exact payload save_to_file needs)
and release the lock, and then perform the actual file write outside the lock
(either by calling a new helper like _write_serialized_to_file(serialized_data)
or refactoring save_to_file to accept the pre-serialized data); keep references
to self._lock, add_block and save_to_file to locate the change.
- Line 33: Remove the unused local assignment in the _load_from_file method:
delete the line that sets temp_file = None in core.chain._load_from_file since
temp_file is never referenced; ensure no other logic depends on temp_file and
run linters/tests to confirm the unused variable warning (Ruff F841) is
resolved.
- Around line 78-90: The current merkle validation compares
curr_block.merkle_root to a newly constructed Block's merkle_root
(tautological); instead, during reconstruction make the raw block_data available
(e.g., iterate over data.get("chain", []) or keep block_data alongside
curr_block), reconstruct the Block into a variable (Block(...)) and compare
block_data.get("merkle_root") (the persisted/stored value) to the reconstructed
block.merkle_root, and on mismatch call logger.warning and reset self.chain = []
and break; update the loop that builds self.chain so it validates the stored
merkle_root before appending.

In `@core/merkle.py`:
- Around line 49-67: get_proof currently skips siblings when sibling_idx >=
len(level) causing incomplete proofs for the last leaf in odd-length levels;
update MerkleTree.get_proof to emit a self-sibling in that case (use the node's
own hash from level[index] as the sibling hash and set the "position"
appropriately) so the proof matches how _build_tree duplicates last nodes,
ensuring MerkleTree.verify_proof can reconstruct the correct parent; locate
get_proof and change the branch that checks sibling_idx < len(level) to append a
sibling even when out-of-range by using the duplicated node hash.

In `@tests/test_merkle.py`:
- Around line 24-36: The test_two_transactions currently obtains MerkleTree,
root via get_merkle_root, and proofs via get_proof(0) and get_proof(1) but never
verifies them; update the test to call MerkleTree.verify_proof (or verify_proof)
for each leaf using the corresponding proof and the root, asserting that
verification returns True (and also assert that verification fails for a
tampered leaf or proof if desired) so the 2-tx round-trip is fully exercised.

api/main.py Outdated
Comment on lines 12 to 21
@asynccontextmanager
async def lifespan(app: FastAPI):
yield
blockchain.save_to_file()


app = FastAPI(title="MiniChain API", description="SPV-enabled blockchain API", lifespan=lifespan)

blockchain = Blockchain()
mempool = Mempool()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Module-level blockchain = Blockchain() triggers file I/O on import — side effect risk.

Importing api.main immediately loads the chain from disk and creates a genesis block if the file doesn't exist. This makes testing harder and can cause surprising behavior when the module is imported for introspection. Consider lazy initialization or explicit startup logic inside the lifespan context manager.

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 13-13: Unused function argument: app

(ARG001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/main.py` around lines 12 - 21, The module-level instantiation of
Blockchain() and Mempool() causes file I/O on import; move their creation into
the lifespan startup so initialization is explicit and lazy. In the
asynccontextmanager lifespan function, before yield create the instances
(Blockchain() and Mempool()), attach them to the FastAPI app (e.g.,
app.state.blockchain and app.state.mempool) so handlers can access them, and
keep the existing blockchain.save_to_file() after yield to persist on shutdown.
Update any code that referenced the module-level symbols (blockchain, mempool)
to read from app.state instead.

- Extract mine_and_process_block to core/mining.py
- Move blockchain/mempool creation to lifespan (lazy init)
- Remove unused chain_length variable
- Rename unused tx to _ in loops
- Move pending_nonce_map to module level
- Remove unused temp_file in _load_from_file
- Add genesis block validation
- Fix merkle validation to compare against block_data
- Fix add_block to release lock before saving
- Fix get_proof to emit self-sibling for odd levels
- Update test_two_transactions to verify proofs
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
main.py (3)

88-90: ⚠️ Potential issue | 🟠 Major

chain.state reference after removing explicit state will still work.

After fixing Blockchain() instantiation (removing the state arg), chain.state is used at lines 89, 90, 124, 125 etc. Since Blockchain.__init__ creates its own State(), chain.state is valid. However, the local state variable at line 24 becomes stale and disconnected from the chain's actual state. All references to the standalone state (lines 31, 89, etc.) should be replaced with chain.state.

🐛 Proposed fix — use chain.state consistently
-    state = State()
-    chain = Blockchain(state)
+    chain = Blockchain()

     ...

     def claim_nonce(address):
-        account = state.get_account(address)
+        account = chain.state.get_account(address)
         ...

-    chain.state.credit_mining_reward(alice_pk, reward=100)
-    sync_nonce(chain.state, pending_nonce_map, alice_pk)
+    chain.state.credit_mining_reward(alice_pk, reward=100)
+    sync_nonce(chain.state, pending_nonce_map, alice_pk)

The chain.state references at lines 89, 90, 124, 125, 127, 128 are already correct. Only line 31 uses the stale state local — update it to chain.state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 88 - 90, The local standalone variable state is stale
after instantiating Blockchain() which now creates its own State; replace any
uses of the standalone state (the variable created earlier) with chain.state so
the program always manipulates the chain's authoritative state (e.g., update the
initialization that uses state and any subsequent calls such as the one that
sets pending_nonce_map entries and any references passed to sync_nonce or
chain.state.credit_mining_reward). Ensure all references to the old state
variable are removed and only chain.state is referenced (check functions/methods
like sync_nonce, chain.state.credit_mining_reward, and any places building
pending_nonce_map).

74-79: ⚠️ Potential issue | 🔴 Critical

Critical: _run_node call passes 6 arguments but the function accepts only 5 — TypeError at runtime.

Line 74 passes (network, state, chain, mempool, pending_nonce_map, claim_nonce) but line 79 defines _run_node(network, chain, mempool, pending_nonce_map, get_next_nonce). The extra state argument shifts all positional parameters, causing mempool.get_transactions_for_block() to be called on a Blockchain object, etc.

🐛 Proposed fix
-        await _run_node(network, state, chain, mempool, pending_nonce_map, claim_nonce)
+        await _run_node(network, chain, mempool, pending_nonce_map, claim_nonce)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 74 - 79, The call to _run_node passes six positional
args (network, state, chain, mempool, pending_nonce_map, claim_nonce) but the
definition _run_node(network, chain, mempool, pending_nonce_map, get_next_nonce)
only accepts five, shifting parameters and causing runtime TypeErrors; fix by
adding the missing state parameter to the _run_node definition (change signature
to _run_node(network, state, chain, mempool, pending_nonce_map, get_next_nonce))
so parameter order matches the call and all uses of state, chain, mempool,
pending_nonce_map, and get_next_nonce inside the function remain correct.

24-25: ⚠️ Potential issue | 🔴 Critical

Critical: Blockchain(state) passes wrong argument type, and _run_node() call has mismatched argument count.

Line 25: Blockchain.__init__ (core/chain.py) expects chain_file: Optional[str] and creates its own State internally. Passing state sets self._chain_file = state, causing os.path.exists() in _load_from_file() to raise TypeError.

Line 74: Call to _run_node() passes 6 arguments (network, state, chain, mempool, pending_nonce_map, claim_nonce) but the function at line 79 accepts only 5 parameters (network, chain, mempool, pending_nonce_map, get_next_nonce), causing TypeError: _run_node() takes 5 positional arguments but 6 were given.

🐛 Proposed fixes
-    state = State()
-    chain = Blockchain(state)
+    chain = Blockchain()
-        await _run_node(network, state, chain, mempool, pending_nonce_map, claim_nonce)
+        await _run_node(network, chain, mempool, pending_nonce_map, claim_nonce)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.py` around lines 24 - 25, The code instantiates Blockchain with the
wrong argument and calls _run_node with the wrong parameters: replace
Blockchain(state) with Blockchain() so Blockchain.__init__ can create its own
State (fixes passing a State into chain_file and the os.path.exists TypeError),
and update the _run_node call to match its signature by removing the extra
argument (pass network, chain, mempool, pending_nonce_map, get_next_nonce) — or
if claim_nonce is intended as the nonce provider, rename/pass it as
get_next_nonce so the five-argument signature (_run_node(network, chain,
mempool, pending_nonce_map, get_next_nonce)) is satisfied.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/main.py`:
- Around line 71-79: The get_chain endpoint currently accesses the private
synchronization primitive blockchain._lock directly (in get_chain), breaking
encapsulation; add a public, thread-safe accessor on the Blockchain class (for
example a method like get_chain_copy() or a context-manager style method e.g.,
locked_chain() that returns a safe copy or yields the chain under the internal
lock) and update get_chain to call that new public API (e.g., chain_copy =
blockchain.get_chain_copy() or using with blockchain.locked_chain() as chain:
...) instead of reaching into blockchain._lock; apply the same pattern for other
endpoints that reference blockchain._lock (e.g., get_block and
verify_transaction) so callers never touch the private _lock.
- Around line 12-14: Change the module-level annotations so they allow None
until initialization: import typing and annotate blockchain as
Optional[Blockchain] and mempool as Optional[Mempool] (e.g., blockchain:
Optional[Blockchain] = None, mempool: Optional[Mempool] = None), and give
pending_nonce_map an explicit mapping type (e.g., pending_nonce_map: Dict[str,
int] = {}) to make intent explicit to type-checkers; update the declarations for
the symbols blockchain, mempool, and pending_nonce_map accordingly.
- Around line 161-171: The endpoint function mine_block_endpoint calls
blockchain.save_to_file() after mine_and_process_block returns, but
Blockchain.add_block() already persists the chain; remove the redundant
blockchain.save_to_file() call in mine_block_endpoint to avoid duplicate I/O.
Keep the existing logic that calls mine_and_process_block(blockchain, mempool,
pending_nonce_map) and the success/failure response handling; simply delete the
explicit save_to_file() invocation so persistence is handled only by
Blockchain.add_block().

In `@core/chain.py`:
- Around line 95-102: The control flow in the loader is fragile because the
early return only happens when data.get("state") is truthy; fix load logic so
that when a chain is successfully loaded you always log and return regardless of
whether "state" exists: if data contains the chain set self.chain (and set
self.state = State.from_dict(...) only when present), then call
logger.info(f"Loaded chain with {len(self.chain)} blocks from
{self._chain_file}") and return; only fall through to
self._create_genesis_block() when no chain was loaded. Update the block around
the State.from_dict, logger.info, and return to ensure logger.info and the
return execute for a loaded chain even if state is falsy.
- Around line 108-138: The save_to_file method reads self.chain and self.state
without holding self._lock, which can race with add_block (used by
mine_block_endpoint) and produce a corrupted serialization; wrap the
serialization and any uses of self.chain/self.state (building the data dict and
computing len(self.chain) for the log) in the same lock (e.g., with self._lock:)
so the snapshot is taken atomically, then perform file I/O outside or still
under the lock as appropriate, and keep the existing temp-file replace/cleanup
logic unchanged.
- Around line 190-216: The serialization of self.chain is duplicated; extract
the dict-building logic into a new helper method (e.g.,
_serialize_chain_data(self)) that returns the data dict, then centralize file
writing into an unlocked helper (e.g., _save_to_file_unlocked(self, data) which
performs the temp file write/replace and logging without acquiring locks).
Change save_to_file to acquire the lock, call _serialize_chain_data(), then call
_save_to_file_unlocked(data), and change add_block to reuse the same helpers by
calling _serialize_chain_data() and _save_to_file_unlocked(data) while it
already holds the lock (or call save_to_file only if you switch to letting
save_to_file handle locking); ensure method names referenced here (add_block,
save_to_file, _serialize_chain_data, _save_to_file_unlocked) are used so the
duplicated block is removed.

In `@core/mining.py`:
- Around line 20-24: The code reads chain.last_block twice causing a TOCTOU
race; capture chain.last_block once into a local variable (e.g., last_block =
chain.last_block) and then use last_block.index and last_block.hash when
constructing the Block (Block(... index=last_block.index + 1,
previous_hash=last_block.hash, transactions=pending_txs)). Ensure you replace
the two separate accesses to chain.last_block with this single local reference
so index and previous_hash come from the same block.
- Around line 18-26: The code drains transactions up front by calling
mempool.get_transactions_for_block() before mining, so if mine_block(block)
raises MiningExceededError or chain.add_block(mined_block) rejects those
transactions are lost; modify the flow to either (A) fetch transactions without
removing them and only remove them from the mempool after chain.add_block
succeeds, or (B) if get_transactions_for_block() must remove them, catch
exceptions from mine_block() and chain.add_block() and re-add the transactions
back into the mempool in the except handlers; reference functions/classes:
mempool.get_transactions_for_block(), mine_block(), chain.add_block, and the
MiningExceededError exception to implement conditional removal or re-add on
failure.

In `@tests/test_merkle.py`:
- Around line 43-51: Update the test_odd_transaction_count in
tests/test_merkle.py to perform proof round-trip checks: after creating
MerkleTree(txs) and obtaining root via get_merkle_root(), iterate over each leaf
index, call MerkleTree.get_proof(index) (or the project's proof-generation
method) and then call MerkleTree.verify_proof(leaf, proof, root) (or the
project's verification method) asserting verification is True for every leaf;
this will exercise the self-sibling logic in MerkleTree and prevent regressions.

---

Outside diff comments:
In `@main.py`:
- Around line 88-90: The local standalone variable state is stale after
instantiating Blockchain() which now creates its own State; replace any uses of
the standalone state (the variable created earlier) with chain.state so the
program always manipulates the chain's authoritative state (e.g., update the
initialization that uses state and any subsequent calls such as the one that
sets pending_nonce_map entries and any references passed to sync_nonce or
chain.state.credit_mining_reward). Ensure all references to the old state
variable are removed and only chain.state is referenced (check functions/methods
like sync_nonce, chain.state.credit_mining_reward, and any places building
pending_nonce_map).
- Around line 74-79: The call to _run_node passes six positional args (network,
state, chain, mempool, pending_nonce_map, claim_nonce) but the definition
_run_node(network, chain, mempool, pending_nonce_map, get_next_nonce) only
accepts five, shifting parameters and causing runtime TypeErrors; fix by adding
the missing state parameter to the _run_node definition (change signature to
_run_node(network, state, chain, mempool, pending_nonce_map, get_next_nonce)) so
parameter order matches the call and all uses of state, chain, mempool,
pending_nonce_map, and get_next_nonce inside the function remain correct.
- Around line 24-25: The code instantiates Blockchain with the wrong argument
and calls _run_node with the wrong parameters: replace Blockchain(state) with
Blockchain() so Blockchain.__init__ can create its own State (fixes passing a
State into chain_file and the os.path.exists TypeError), and update the
_run_node call to match its signature by removing the extra argument (pass
network, chain, mempool, pending_nonce_map, get_next_nonce) — or if claim_nonce
is intended as the nonce provider, rename/pass it as get_next_nonce so the
five-argument signature (_run_node(network, chain, mempool, pending_nonce_map,
get_next_nonce)) is satisfied.

Comment on lines 71 to 79
@app.get("/chain", response_model=ChainInfo)
def get_chain():
with blockchain._lock:
chain_copy = list(blockchain.chain)

return {
"length": len(chain_copy),
"blocks": [block.to_dict() for block in chain_copy]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Accessing blockchain._lock directly breaks encapsulation.

All three read endpoints (get_chain, get_block, verify_transaction) reach into the private _lock. Consider exposing a public method on Blockchain (e.g., get_block(index) or a context manager) so the API doesn't couple to internal synchronization details.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/main.py` around lines 71 - 79, The get_chain endpoint currently accesses
the private synchronization primitive blockchain._lock directly (in get_chain),
breaking encapsulation; add a public, thread-safe accessor on the Blockchain
class (for example a method like get_chain_copy() or a context-manager style
method e.g., locked_chain() that returns a safe copy or yields the chain under
the internal lock) and update get_chain to call that new public API (e.g.,
chain_copy = blockchain.get_chain_copy() or using with blockchain.locked_chain()
as chain: ...) instead of reaching into blockchain._lock; apply the same pattern
for other endpoints that reference blockchain._lock (e.g., get_block and
verify_transaction) so callers never touch the private _lock.

Comment on lines +20 to +24
block = Block(
index=chain.last_block.index + 1,
previous_hash=chain.last_block.hash,
transactions=pending_txs,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

TOCTOU: chain.last_block is accessed twice, each acquiring/releasing the lock independently.

Between the two calls (.index + 1 and .hash), another block could theoretically be added, causing index and previous_hash to reference different blocks. Capture last_block once:

♻️ Proposed fix
-    block = Block(
-        index=chain.last_block.index + 1,
-        previous_hash=chain.last_block.hash,
-        transactions=pending_txs,
-    )
+    last = chain.last_block
+    block = Block(
+        index=last.index + 1,
+        previous_hash=last.hash,
+        transactions=pending_txs,
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
block = Block(
index=chain.last_block.index + 1,
previous_hash=chain.last_block.hash,
transactions=pending_txs,
)
last = chain.last_block
block = Block(
index=last.index + 1,
previous_hash=last.hash,
transactions=pending_txs,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/mining.py` around lines 20 - 24, The code reads chain.last_block twice
causing a TOCTOU race; capture chain.last_block once into a local variable
(e.g., last_block = chain.last_block) and then use last_block.index and
last_block.hash when constructing the Block (Block(... index=last_block.index +
1, previous_hash=last_block.hash, transactions=pending_txs)). Ensure you replace
the two separate accesses to chain.last_block with this single local reference
so index and previous_hash come from the same block.

Comment on lines +45 to +50
for tx in mined_block.transactions:
sync_nonce(chain.state, pending_nonce_map, tx.sender)

result = chain.state.get_account(tx.receiver) if tx.receiver else None
if isinstance(result, dict):
deployed_contracts.append(tx.receiver)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

deployed_contracts logic is inverted — it captures all receivers, not actual deployments.

get_account() always returns a dict (auto-creates if missing), so isinstance(result, dict) is always True for any non-None receiver. Meanwhile, actual contract deployments have tx.receiver is None, which this code skips.

Currently unused by callers (block, *_ = ...), but the return value is misleading if anyone relies on it.

- Add Optional/Dict typing annotations for blockchain, mempool, pending_nonce_map
- Add public get_chain_copy() method to Blockchain class
- Remove redundant blockchain.save_to_file() in mine_block_endpoint
- Fix _load_from_file control flow for loaded chain logging
- Add _serialize_chain_data() and _save_to_file_unlocked() helpers
- Fix save_to_file to use lock properly
- Fix mining.py to return transactions on mining failure
- Add proof verification for odd_transaction_count test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Introduce Merkle Tree for Transaction Verification (SPV Support)

1 participant