Skip to content

Add NNUE neural network evaluation#58

Open
luccabb wants to merge 2 commits intomasterfrom
improve/nnue-evaluation
Open

Add NNUE neural network evaluation#58
luccabb wants to merge 2 commits intomasterfrom
improve/nnue-evaluation

Conversation

@luccabb
Copy link
Owner

@luccabb luccabb commented Feb 17, 2026

Summary

  • Implement dual-perspective NNUE evaluator with 768 sparse binary features, shared feature transformer (768→128), SCReLU activation, and incremental accumulator updates during search
  • Add NNEngine with accumulator state management (save/restore on push/pop) supporting both fixed-depth and timed iterative deepening
  • Add training pipeline: Stockfish-labeled data generation (scripts/generate_training_data.py) and PyTorch training with sigmoid-scaled MSE + WDL loss (scripts/train_nnue.py)
  • Ship pre-trained model (moonfish/models/nnue_v1.npz, 386 KB, ~99K parameters) trained on 96K positions at depth 8

Architecture

Input: 768 sparse binary features (6 piece types × 2 colors × 64 squares)
Feature transformer: Linear(768, 128) — shared weights for both perspectives
Dual perspective: concat(SCReLU(stm_acc), SCReLU(nstm_acc)) → 256
Output: Linear(256, 1)

Verification Results

Inference Speed

Operation Latency
Evaluate only 3.4 μs
Update + evaluate + restore 8.3 μs
Full reset + evaluate 35.0 μs

Accumulator Correctness

All special move types verified (incremental update matches full recomputation):

  • ✅ Quiet moves
  • ✅ Captures
  • ✅ En passant
  • ✅ Kingside castling
  • ✅ Queenside castling
  • ✅ Pawn promotion
  • ✅ Promotion with capture
  • ✅ 20-move game replay (20/20 positions match)

NPS Benchmark (depth 3, 48 Stockfish positions)

Engine NPS Ratio
Classical (PeSTO) 25,666 100%
NNUE 23,482 92%

UCI Test

$ echo "uci\nisready\nposition startpos\ngo movetime 5000\nquit" | moonfish --mode uci --algorithm nn
id name Moonfish
id author luccabb
uciok
readyok
bestmove g1f3

Existing Tests

  • 16/16 alpha_beta mate-puzzle tests: ✅ PASS
  • Bratko-Kopec test suite: ✅ PASS

Usage

# Use with bundled model (auto-discovered)
moonfish --mode uci --algorithm nn

# Use with custom model
moonfish --mode uci --algorithm nn --nn-model-path path/to/model.npz

# Generate training data (requires Stockfish)
uv run python scripts/generate_training_data.py --num-positions 1000000 --depth 8 --output data/training_data.npz

# Train model (requires torch)
uv run --extra nn python scripts/train_nnue.py --data data/training_data.npz --output moonfish/models/nnue_v1.npz

Files Changed

File Change
moonfish/evaluation/nn.py New — NNUEEvaluator with dual-perspective accumulators, incremental updates, SCReLU
moonfish/evaluation/base.py New — Evaluator protocol
moonfish/evaluation/classical.py New — ClassicalEvaluator wrapping PeSTO
moonfish/evaluation/__init__.py New — Package exports
moonfish/engines/nn_engine.py New — NNEngine with accumulator save/restore during search
moonfish/models/nnue_v1.npz New — Pre-trained weights (386 KB)
scripts/generate_training_data.py New — Stockfish position generator
scripts/train_nnue.py New — PyTorch training script
moonfish/config.py Added nn_model_path field
moonfish/helper.py Added nn algorithm, factory function, default model discovery
moonfish/main.py Added --nn-model-path CLI option
pyproject.toml Added numpy dep, [nn] optional (torch), evaluation/engine packages

Test plan

  • NNUE evaluator: incremental updates match full recomputation for all move types
  • NNEngine: search_move returns valid moves at depth 2-3
  • NNEngine: search_move_timed works with time limits
  • UCI integration: go movetime, go depth work with --algorithm nn
  • Factory: get_engine() correctly creates NNEngine with bundled model
  • Existing tests: all 16 alpha_beta tests pass, Bratko-Kopec passes
  • NPS benchmark: NNUE runs at 92% of classical speed
  • Inference speed: <10μs per position with incremental updates

The engine previously searched to a fixed depth (default 3) regardless
of available time, wasting most of the allocated thinking time in
timed games. With 60s per move, the engine would finish in <1s.

Add search_move_timed() to AlphaBeta that uses iterative deepening
under a time constraint: searches depth 1, 2, 3, ... until time runs
out, keeping the best move from the last completed depth. Time is
checked every 512 nodes to minimize overshoot.

Update the UCI handler to parse go parameters (wtime, btime, winc,
binc, movetime, movestogo, depth) and calculate appropriate time
allocation per move with 1-second safety margins for Python overhead.
Implement a dual-perspective NNUE (Efficiently Updatable Neural Network)
evaluator following the Stockfish architecture pattern. The network uses
768 sparse binary features, shared feature transformer weights, SCReLU
activation, and incremental accumulator updates during search for
efficient inference using only numpy.

Architecture: 768 → Linear(768,128) → SCReLU → concat(stm,nstm) → Linear(256,1)
Parameters: ~99K | Model size: 386 KB | Inference: ~8μs per position

Key components:
- moonfish/evaluation/nn.py: NNUEEvaluator with dual-perspective accumulators
  and incremental updates for all move types (quiet, capture, castling,
  en passant, promotion)
- moonfish/engines/nn_engine.py: NNEngine with accumulator save/restore on
  the search stack, supporting both fixed-depth and timed iterative deepening
- moonfish/evaluation/base.py: Evaluator protocol for pluggable evaluation
- moonfish/evaluation/classical.py: ClassicalEvaluator wrapping PeSTO tables
- scripts/generate_training_data.py: Stockfish-labeled position generator
- scripts/train_nnue.py: PyTorch training with sigmoid-scaled MSE + WDL loss
- moonfish/models/nnue_v1.npz: Pre-trained weights (96K positions, depth 8)

Integration:
- New --algorithm nn flag and --nn-model-path CLI option
- Bundled model auto-discovered from package directory
- numpy added as core dependency; torch as optional [nn] extra
- Full UCI compatibility including go movetime/depth/wtime+btime
@github-actions
Copy link

Benchmarks

The following benchmarks are available for this PR:

Command Description
/run-nps-benchmark NPS speed benchmark (depth 5, 48 positions)
/run-stockfish-benchmark Stockfish strength benchmark (300 games)

Post a comment with the command to trigger a benchmark run.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant