⚡ Profiler: Fused Diffusion-Reaction Loop#451
Conversation
- Add `apply_step` to `SpatialDiffusion` trait for fused operations - Implement optimized `apply_step` in `FiniteDifference1D` using sliding window - Update `TuringSystem::step` to use the fused kernel - 2x speedup (537ms -> 257ms) in `bench_morphogenesis` Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the reaction–diffusion “TuringSystem” stepping routine by fusing diffusion, reaction, and Euler integration into a single pass to reduce memory bandwidth pressure and improve runtime throughput.
Changes:
- Added a new
apply_stepmethod to theSpatialDiffusiontrait with a default (non-fused) implementation. - Updated
TuringSystem::stepto useSpatialDiffusion::apply_stepinstead of separate diffusion + reaction/integration passes. - Implemented a fused
apply_stephot path forFiniteDifference1Dusing a sliding-window stencil andunsafeindexing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
math_explorer/src/biology/morphogenesis.rs |
Switches the main Turing step to the new fused diffusion+reaction+integration entry point. |
math_explorer/src/biology/diffusion.rs |
Extends the diffusion trait with apply_step and provides an optimized fused implementation for FiniteDifference1D. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert!(v.len() >= n, "v buffer too small"); | ||
| assert!(out_u.len() >= n, "out_u buffer too small"); | ||
| assert!(out_v.len() >= n, "out_v buffer too small"); |
There was a problem hiding this comment.
These assert! length checks execute in release builds and add overhead on the step hot path. If the goal is maximum throughput, consider using debug_assert! (or feature-gating the checks) so they’re compiled out in optimized builds while still catching issues during development/testing.
| assert!(v.len() >= n, "v buffer too small"); | |
| assert!(out_u.len() >= n, "out_u buffer too small"); | |
| assert!(out_v.len() >= n, "out_v buffer too small"); | |
| debug_assert!(v.len() >= n, "v buffer too small"); | |
| debug_assert!(out_u.len() >= n, "out_u buffer too small"); | |
| debug_assert!(out_v.len() >= n, "out_v buffer too small"); |
| .apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u, v| { | ||
| self.kinetics.reaction(u, v) |
There was a problem hiding this comment.
The closure parameters |u, v| shadow the outer u/v slice bindings in this function, which makes the fused-step call harder to read. Renaming the closure parameters (e.g., u_curr, v_curr) or using a small helper function would avoid the shadowing.
| .apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u, v| { | |
| self.kinetics.reaction(u, v) | |
| .apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u_curr, v_curr| { | |
| self.kinetics.reaction(u_curr, v_curr) |
| fn apply_step<F>( | ||
| &self, | ||
| u: &[f64], | ||
| v: &[f64], | ||
| out_u: &mut [f64], | ||
| out_v: &mut [f64], | ||
| d_u: f64, | ||
| d_v: f64, | ||
| dt: f64, | ||
| reaction: F, | ||
| ) where | ||
| F: Fn(f64, f64) -> (f64, f64), |
There was a problem hiding this comment.
SpatialDiffusion is a public trait, and adding a generic method (apply_step<F>) makes the trait non-object-safe. Any downstream code that used dyn SpatialDiffusion (e.g., Box<dyn SpatialDiffusion>) will stop compiling. If object safety/backwards compatibility matters, consider moving apply_step to a separate extension trait, or changing the signature to take reaction: &mut dyn FnMut(f64, f64) -> (f64, f64) (non-generic) to preserve object safety.
| reaction: F, | ||
| ) where | ||
| F: Fn(f64, f64) -> (f64, f64), |
There was a problem hiding this comment.
The reaction callback is constrained to Fn, which prevents callers from passing closures that need internal mutable state (those require FnMut). Since Fn closures also implement FnMut, changing the bound to FnMut(f64, f64) -> (f64, f64) would be strictly more flexible without impacting current call sites.
| reaction: F, | |
| ) where | |
| F: Fn(f64, f64) -> (f64, f64), | |
| mut reaction: F, | |
| ) where | |
| F: FnMut(f64, f64) -> (f64, f64), |
📉 The Bottleneck: The
TuringSystem::stepmethod was performing two passes over the state vectors: one for diffusion (reading u, v, writing next_u, next_v) and one for reaction/integration (reading all four, writing next_u, next_v). This double memory traversal was limiting throughput.🚀 The Boost: Before: ~537ms -> After: ~257ms (52% improvement / 2.08x speedup).
💻 Technical Detail: Added
apply_steptoSpatialDiffusiontrait and implemented it forFiniteDifference1Dusing a fused loop that computes the Laplacian and immediately applies the reaction and Euler step, keeping data in registers/L1 cache.🧪 Verification: Logic preservation verified with
test_turing_system_logic_preservation. Performance verified withbench_morphogenesis.PR created automatically by Jules for task 2088545017155205550 started by @fderuiter