From d167190e0b33c93bd7e511fca2dc0422318216d2 Mon Sep 17 00:00:00 2001 From: Janardhan Pulivarthi Date: Sat, 6 Dec 2025 13:25:36 +0000 Subject: [PATCH 1/4] [DOC] add documentation for the OOCEvictionManager design --- .../instructions/ooc/OOCEvictionManager.java | 78 +++++++++++-------- 1 file changed, 47 insertions(+), 31 deletions(-) diff --git a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java index f5ae7573b0a..68dc3a0f542 100644 --- a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java +++ b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java @@ -49,38 +49,54 @@ * Eviction Manager for the Out-Of-Core stream cache * This is the base implementation for LRU, FIFO * - * Design choice 1: Pure JVM-memory cache - * What: Store MatrixBlock objects in a synchronized in-memory cache - * (Map + Deque for LRU/FIFO). Spill to disk by serializing MatrixBlock - * only when evicting. - * Pros: Simple to implement; no off-heap management; easy to debug; - * no serialization race since you serialize only when evicting; - * fast cache hits (direct object access). - * Cons: Heap usage counted roughly via serialized-size estimate — actual - * JVM object overhead not accounted; risk of GC pressure and OOM if - * estimates are off or if many small objects cause fragmentation; - * eviction may be more expensive (serialize on eviction). - *

- * Design choice 2: - *

- * This manager runtime memory management by caching serialized - * ByteBuffers and spilling them to disk when needed. - *

- * * core function: Caches ByteBuffers (off-heap/direct) and - * spills them to disk - * * Eviction: Evicts a ByteBuffer by writing its contents to a file - * * Granularity: Evicts one IndexedMatrixValue block at a time - * * Data replay: get() will always return the data either from memory or - * by falling back to the disk - * * Memory: Since the datablocks are off-heap (in ByteBuffer) or disk, - * there won't be OOM. + * Purpose + * ------- + * Provides a bounded cache for matrix blocks produced and consumed by OOC + * streaming operators. When memory pressure exceeds a configured limit, + * blocks are evicted from memory and spilled to disk, and transparently + * restored on demand. + * + * + * Design scope + * ------------ + * - Manages block lifecycle across the states: + * HOT : block in memory + * EVICTING: block spilled to disk + * COLD: block persisted on disk and to be reload when needed + * + * - Guarantees correctness under concurrent get/put operations with: + * * per-block locks + * * explicit eviction state transitions + * + * - Integration with Resettable to support: + * * multiple consumers + * * deterministic replay + * * eviction-safe reuse of shared inputs for tee operator + * + * + * Eviction Strategy + * ----------------- + * - Uses FIFO or LRU ordering at block granularity. + * - Eviction is partition-based: + * * blocks are spilled in batches to a single partition file + * * enables high-throughput sequential disk I/O + * - Each evicted block records a (partitionId, offset) for direct reload. + * * - * Pros: Avoids heap OOM by keeping large data off-heap; predictable - * memory usage; good for very large blocks. - * Cons: More complex synchronization; need robust off-heap allocator/free; - * must ensure serialization finishes before adding to queue or make evict - * wait on serialization; careful with native memory leaks. - */ + * Disk Layout + * ----------- + * - Spill files are append-only partition files + * - Each partition may contain multiple serialized blocks + * - Metadata remains in-memory while block data can be on disk + * + * + * Concurrency Model + * ----------------- + * - Global cache structure guarded by a cache-level lock. + * - Each block has an independent lock and condition variable. + * - Readers wait when a block is in EVICTING state. + * - Disk I/O is performed outside global locks to avoid blocking producers. + * / public class OOCEvictionManager { // Configuration: OOC buffer limit as percentage of heap From 9593a7cba6e8e41b8902df97db1a7707ab3ebba3 Mon Sep 17 00:00:00 2001 From: Janardhan Pulivarthi Date: Sat, 6 Dec 2025 13:31:34 +0000 Subject: [PATCH 2/4] [DOC] add documentation for the OOCEvictionManager design --- .../instructions/ooc/OOCEvictionManager.java | 99 ++++++++++--------- 1 file changed, 50 insertions(+), 49 deletions(-) diff --git a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java index 68dc3a0f542..cf5083bd752 100644 --- a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java +++ b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java @@ -46,57 +46,58 @@ import java.util.concurrent.locks.ReentrantLock; /** - * Eviction Manager for the Out-Of-Core stream cache - * This is the base implementation for LRU, FIFO + * Eviction Manager for the Out-Of-Core (OOC) stream cache. + *

+ * This manager implements a high-performance, thread-safe buffer pool designed + * to handle intermediate results that exceed available heap memory. It builds on + * a partitioned eviction strategy to maximize disk throughput and a + * lock-striped concurrency model to minimize thread contention. * - * Purpose - * ------- - * Provides a bounded cache for matrix blocks produced and consumed by OOC - * streaming operators. When memory pressure exceeds a configured limit, - * blocks are evicted from memory and spilled to disk, and transparently - * restored on demand. - * - * - * Design scope - * ------------ - * - Manages block lifecycle across the states: - * HOT : block in memory - * EVICTING: block spilled to disk - * COLD: block persisted on disk and to be reload when needed - * - * - Guarantees correctness under concurrent get/put operations with: - * * per-block locks - * * explicit eviction state transitions - * - * - Integration with Resettable to support: - * * multiple consumers - * * deterministic replay - * * eviction-safe reuse of shared inputs for tee operator - * - * - * Eviction Strategy - * ----------------- - * - Uses FIFO or LRU ordering at block granularity. - * - Eviction is partition-based: - * * blocks are spilled in batches to a single partition file - * * enables high-throughput sequential disk I/O - * - Each evicted block records a (partitionId, offset) for direct reload. - * + *

1. Purpose

+ * Provides a bounded cache for {@code MatrixBlock}s produced and consumed by OOC + * streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure + * exceeds a configured limit, blocks are transparently evicted to disk and restored + * on demand. * - * Disk Layout - * ----------- - * - Spill files are append-only partition files - * - Each partition may contain multiple serialized blocks - * - Metadata remains in-memory while block data can be on disk - * - * - * Concurrency Model - * ----------------- - * - Global cache structure guarded by a cache-level lock. - * - Each block has an independent lock and condition variable. - * - Readers wait when a block is in EVICTING state. - * - Disk I/O is performed outside global locks to avoid blocking producers. - * / + *

2. Lifecycle Management

+ * Blocks transition atomically through three states to ensure data consistency: + * + * + *

3. Eviction Strategy (Partitioned I/O)

+ * To mitigate I/O thrashing caused by writing thousands of small blocks: + * + * + *

4. Data Integrity (Re-hydration)

+ * To prevent index corruption during serialization/deserialization cycles, this manager + * uses a "re-hydration" model. The {@code IndexedMatrixValue} container is never + * removed from the cache structure. Eviction only nulls the data payload. Loading + * restores the data into the existing container, preserving the original {@code MatrixIndexes}. + * + *

5. Concurrency Model (Lock-Striping)

+ * + */ + public class OOCEvictionManager { // Configuration: OOC buffer limit as percentage of heap From 33792e20715d2ff903dd506a863c3c32691a4d3c Mon Sep 17 00:00:00 2001 From: Janardhan Pulivarthi Date: Sat, 6 Dec 2025 13:36:27 +0000 Subject: [PATCH 3/4] [DOC] update note on concurrency --- .../instructions/ooc/OOCEvictionManager.java | 23 +++++++++++-------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java index cf5083bd752..15e1780a19e 100644 --- a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java +++ b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java @@ -85,16 +85,21 @@ * removed from the cache structure. Eviction only nulls the data payload. Loading * restores the data into the existing container, preserving the original {@code MatrixIndexes}. * - *

5. Concurrency Model (Lock-Striping)

+ *

5. Concurrency Model (Fine-Grained Locking)

* */ From 5a821ca419de48b5b01d13c349fdf947f8ce332d Mon Sep 17 00:00:00 2001 From: Janardhan Pulivarthi Date: Sat, 6 Dec 2025 14:42:20 +0000 Subject: [PATCH 4/4] fix h3 header to h2 --- .../instructions/ooc/OOCEvictionManager.java | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java index 15e1780a19e..099a26ebd9d 100644 --- a/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java +++ b/src/main/java/org/apache/sysds/runtime/instructions/ooc/OOCEvictionManager.java @@ -49,17 +49,17 @@ * Eviction Manager for the Out-Of-Core (OOC) stream cache. *

* This manager implements a high-performance, thread-safe buffer pool designed - * to handle intermediate results that exceed available heap memory. It builds on + * to handle intermediate results that exceed available heap memory. It employs * a partitioned eviction strategy to maximize disk throughput and a * lock-striped concurrency model to minimize thread contention. * - *

1. Purpose

+ *

1. Purpose

* Provides a bounded cache for {@code MatrixBlock}s produced and consumed by OOC * streaming operators (e.g., {@code tsmm}, {@code ba+*}). When memory pressure * exceeds a configured limit, blocks are transparently evicted to disk and restored - * on demand. + * on demand, allowing execution of operations larger than RAM. * - *

2. Lifecycle Management

+ *

2. Lifecycle Management

* Blocks transition atomically through three states to ensure data consistency: *
    *
  • HOT: The block is pinned in the JVM heap ({@code value != null}).
  • @@ -69,7 +69,7 @@ * to free memory, but the container (metadata) remains in the cache map. *
* - *

3. Eviction Strategy (Partitioned I/O)

+ *

3. Eviction Strategy (Partitioned I/O)

* To mitigate I/O thrashing caused by writing thousands of small blocks: *
    *
  • Eviction is partition-based: Groups of "HOT" blocks are gathered into @@ -79,13 +79,13 @@ * evicted block, allowing random-access reloading.
  • *
* - *

4. Data Integrity (Re-hydration)

+ *

4. Data Integrity (Re-hydration)

* To prevent index corruption during serialization/deserialization cycles, this manager * uses a "re-hydration" model. The {@code IndexedMatrixValue} container is never * removed from the cache structure. Eviction only nulls the data payload. Loading * restores the data into the existing container, preserving the original {@code MatrixIndexes}. * - *

5. Concurrency Model (Fine-Grained Locking)

+ *

5. Concurrency Model (Fine-Grained Locking)

*
    *
  • Global Structure Lock: A coarse-grained lock ({@code _cacheLock}) guards * the {@code LinkedHashMap} structure against concurrent insertions, deletions, @@ -93,7 +93,7 @@ * *
  • Per-Block Locks: Each {@code BlockEntry} owns an independent * {@code ReentrantLock}. This decouples I/O operations, allowing a reader to load - * "Block A" from disk while the evictor simultaneously writes "Block B" to disk, + * "Block A" from disk while the evictor writes "Block B" to disk simultaneously, * maximizing throughput.
  • * *
  • Condition Queues: To handle read-write races, the system uses atomic