Skip to content

⚡ Profiler: Fused Diffusion-Reaction Loop#451

Merged
fderuiter merged 1 commit intomainfrom
profiler/fused-diffusion-reaction-2088545017155205550
Feb 9, 2026
Merged

⚡ Profiler: Fused Diffusion-Reaction Loop#451
fderuiter merged 1 commit intomainfrom
profiler/fused-diffusion-reaction-2088545017155205550

Conversation

@fderuiter
Copy link
Owner

📉 The Bottleneck: The TuringSystem::step method was performing two passes over the state vectors: one for diffusion (reading u, v, writing next_u, next_v) and one for reaction/integration (reading all four, writing next_u, next_v). This double memory traversal was limiting throughput.
🚀 The Boost: Before: ~537ms -> After: ~257ms (52% improvement / 2.08x speedup).
💻 Technical Detail: Added apply_step to SpatialDiffusion trait and implemented it for FiniteDifference1D using a fused loop that computes the Laplacian and immediately applies the reaction and Euler step, keeping data in registers/L1 cache.
🧪 Verification: Logic preservation verified with test_turing_system_logic_preservation. Performance verified with bench_morphogenesis.


PR created automatically by Jules for task 2088545017155205550 started by @fderuiter

- Add `apply_step` to `SpatialDiffusion` trait for fused operations
- Implement optimized `apply_step` in `FiniteDifference1D` using sliding window
- Update `TuringSystem::step` to use the fused kernel
- 2x speedup (537ms -> 257ms) in `bench_morphogenesis`

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@fderuiter fderuiter marked this pull request as ready for review February 9, 2026 15:49
Copilot AI review requested due to automatic review settings February 9, 2026 15:49
@fderuiter fderuiter merged commit 710aa81 into main Feb 9, 2026
1 check passed
@fderuiter fderuiter deleted the profiler/fused-diffusion-reaction-2088545017155205550 branch February 9, 2026 15:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the reaction–diffusion “TuringSystem” stepping routine by fusing diffusion, reaction, and Euler integration into a single pass to reduce memory bandwidth pressure and improve runtime throughput.

Changes:

  • Added a new apply_step method to the SpatialDiffusion trait with a default (non-fused) implementation.
  • Updated TuringSystem::step to use SpatialDiffusion::apply_step instead of separate diffusion + reaction/integration passes.
  • Implemented a fused apply_step hot path for FiniteDifference1D using a sliding-window stencil and unsafe indexing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
math_explorer/src/biology/morphogenesis.rs Switches the main Turing step to the new fused diffusion+reaction+integration entry point.
math_explorer/src/biology/diffusion.rs Extends the diffusion trait with apply_step and provides an optimized fused implementation for FiniteDifference1D.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +197 to +199
assert!(v.len() >= n, "v buffer too small");
assert!(out_u.len() >= n, "out_u buffer too small");
assert!(out_v.len() >= n, "out_v buffer too small");
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assert! length checks execute in release builds and add overhead on the step hot path. If the goal is maximum throughput, consider using debug_assert! (or feature-gating the checks) so they’re compiled out in optimized builds while still catching issues during development/testing.

Suggested change
assert!(v.len() >= n, "v buffer too small");
assert!(out_u.len() >= n, "out_u buffer too small");
assert!(out_v.len() >= n, "out_v buffer too small");
debug_assert!(v.len() >= n, "v buffer too small");
debug_assert!(out_u.len() >= n, "out_u buffer too small");
debug_assert!(out_v.len() >= n, "out_v buffer too small");

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +270
.apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u, v| {
self.kinetics.reaction(u, v)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closure parameters |u, v| shadow the outer u/v slice bindings in this function, which makes the fused-step call harder to read. Renaming the closure parameters (e.g., u_curr, v_curr) or using a small helper function would avoid the shadowing.

Suggested change
.apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u, v| {
self.kinetics.reaction(u, v)
.apply_step(u, v, next_u, next_v, self.d_u, self.d_v, dt, |u_curr, v_curr| {
self.kinetics.reaction(u_curr, v_curr)

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +47
fn apply_step<F>(
&self,
u: &[f64],
v: &[f64],
out_u: &mut [f64],
out_v: &mut [f64],
d_u: f64,
d_v: f64,
dt: f64,
reaction: F,
) where
F: Fn(f64, f64) -> (f64, f64),
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpatialDiffusion is a public trait, and adding a generic method (apply_step<F>) makes the trait non-object-safe. Any downstream code that used dyn SpatialDiffusion (e.g., Box<dyn SpatialDiffusion>) will stop compiling. If object safety/backwards compatibility matters, consider moving apply_step to a separate extension trait, or changing the signature to take reaction: &mut dyn FnMut(f64, f64) -> (f64, f64) (non-generic) to preserve object safety.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +47
reaction: F,
) where
F: Fn(f64, f64) -> (f64, f64),
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reaction callback is constrained to Fn, which prevents callers from passing closures that need internal mutable state (those require FnMut). Since Fn closures also implement FnMut, changing the bound to FnMut(f64, f64) -> (f64, f64) would be strictly more flexible without impacting current call sites.

Suggested change
reaction: F,
) where
F: Fn(f64, f64) -> (f64, f64),
mut reaction: F,
) where
F: FnMut(f64, f64) -> (f64, f64),

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant