Skip to content

2026: Add Fluid roadmap#5672

Open
cheyang wants to merge 4 commits intofluid-cloudnative:masterfrom
cheyang:roadmap_2026
Open

2026: Add Fluid roadmap#5672
cheyang wants to merge 4 commits intofluid-cloudnative:masterfrom
cheyang:roadmap_2026

Conversation

@cheyang
Copy link
Collaborator

@cheyang cheyang commented Feb 27, 2026

Ⅰ. Describe what this PR does

Ⅱ. Does this pull request fix one issue?

fixes #XXXX

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Signed-off-by: cheyang <cheyang.cy@alibaba-inc.com>
@fluid-e2e-bot
Copy link

fluid-e2e-bot bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from cheyang by writing /assign @cheyang in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the Fluid project's roadmap from 2025 to 2026, outlining significant advancements in data access, mobility, and real-time availability. Key changes include the introduction of a strategic focus on LLM KV Cache orchestration, enhancements to the generic cache runtime and dynamic configuration, API upgrades, and improved developer tooling to streamline operations and performance.

Highlights

  • Roadmap Year Update: The Fluid roadmap has been updated from 2025 to 2026, reflecting future development plans and strategic directions.
  • Enhanced Data Anyway Capabilities: Introduced a Generic Cache Runtime with a pluggable architecture and AdvancedStatefulset orchestration, Runtime Dynamic Configuration for zero-downtime tuning, API upgradation to v1alpha2, a Validation Webhook, and improved ThinRuntime Productization for minimal container privileges.
  • Strategic LLM KV Cache Orchestration: Added a new P0 strategic focus on LLM KV Cache orchestration, encompassing disaggregated KV cache, cross-Pod sharing, and Mooncake integration for high-performance inference.
  • Improved Data Anywhere Features: Refined Elastic Caching & Cost Optimization to include Kubernetes Node-Specific Restrictions and Cloud-Native Recovery, introduced JindoRuntime High Availability with master Pod crash recovery, and updated Observability-Driven Optimization with ML-based access pattern recognition.
  • Developer Experience Focus: Introduced a new Fluid kubectl Plugin for enhanced developer experience, offering functionalities like dataset status inspection, on-demand prewarming triggering, cache performance profiling, and runtime configuration hot-updates.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • ROADMAP.md
    • Updated the roadmap title from "Fluid 2025 Roadmap" to "Fluid 2026 Roadmap".
    • Introduced "Generic Cache Runtime" with pluggable architecture and AdvancedStatefulset-based orchestration.
    • Added "Runtime Dynamic Configuration" for zero-downtime tuning and hot parameter swapping.
    • Included "API upgradation to v1alpha2" with conversion webhook support.
    • Added "Validation Webhook" for admission-time CRD validation and policy enforcement.
    • Refined "ThinRuntime Productization" to emphasize minimum container privileges.
    • Introduced "LLM KV Cache Orchestration" as a new P0 strategic focus under "Data Anywhere".
    • Updated "Elastic Caching & Scheduling" to "Elastic Caching & Cost Optimization" with new details on Kubernetes Node-Specific Restrictions and Cloud-Native Recovery.
    • Added "JindoRuntime High Availability" with master Pod crash recovery and metadata persistence.
    • Refined "Observability-Driven Optimization" with ML-based access pattern recognition and Dataset Garbage Collection.
    • Updated "Temporal Workflows with Kueue" to "Temporal Workflow Integration" with event-driven policies.
    • Added "Developer Experience" section, introducing the "Fluid kubectl Plugin" for various CLI functionalities.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the project roadmap document to outline Fluid’s 2026 priorities and planned initiatives across data access (“Anyway”), mobility (“Anywhere”), and workflow/DX (“Anytime”).

Changes:

  • Rename the roadmap section from 2025 to 2026 and refresh the planned initiative list.
  • Add new 2026 focus areas (e.g., LLM KV cache orchestration, dynamic runtime config, webhook validation, CLI plugin).
  • Rework bullets and descriptions across the three roadmap pillars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ROADMAP.md Outdated
- Recommend underutilized Pods for scaling (cost/performance-aware).
- Ensure cache engines adapt to dynamic throughput post-scaling.
- **Cloud-Agnostic Recovery**: Rebuild caches across regions using cloud disk snapshots.
- **Elastic Caching & Cost Optimization**
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level list item appears to have an extra leading space before the "-", which can inadvertently nest it under the previous list in Markdown. Align it with other top-level bullets to keep the section structure consistent.

Suggested change
- **Elastic Caching & Cost Optimization**
- **Elastic Caching & Cost Optimization**

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 77 to 79
• **Temporal Workflow Integration**

– **Kueue-Driven Pipelines**: Trigger training/inference jobs automatically upon DataLoad completion; automate post-job cache eviction and data migration.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section uses a Unicode bullet character ("•") instead of a Markdown list marker ("-"). Consider switching to "-" for consistency and to ensure proper list rendering across Markdown viewers.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 85 to 86
– **Fluid kubectl Plugin**_(P1)_: Native CLI extension (kubectl fluid) for:

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priority tag markup is missing a separating space before the italic text ("Fluid kubectl Plugin(P1)"). Add a space so the emphasis renders reliably: "Fluid kubectl Plugin (P1): ..."

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 55 to 56
– **Kubernetes Node-Specific Restrictions**_(P1)_: Support node selectors, affinity/anti-affinity, and taint tolerations for cache Pods.

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priority tag markup again lacks a separating space before the italic text ("...(P1)"). Adding a space before "(P1)" will ensure consistent Markdown rendering.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated

– **Fluid kubectl Plugin**_(P1)_: Native CLI extension (kubectl fluid) for:

- Dataset status inspection and health diagnostics.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List item spacing is inconsistent ("- Dataset ..." has two spaces after the dash). Normalizing to a single space after list markers will keep Markdown rendering and linting consistent.

Suggested change
- Dataset status inspection and health diagnostics.
- Dataset status inspection and health diagnostics.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 8 to 10
- **Generic Cache Runtime**
– **Pluggable Architecture**: Standardized Cache Runtime Interface for rapid integration of new engines (CubeFS, Dragonfly, Vineyard) with minimal boilerplate.
– **AdvancedStatefulset-Based Orchestration**: Migrate from StatefulSet to InstanceSet for fine-grained Pod lifecycle management, ordered rollout, and enhanced failover capabilities.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown list formatting is inconsistent here: sub-items use a Unicode en dash ("–") instead of a standard Markdown list marker ("-") and indentation varies. This can render as plain text instead of nested bullets; consider using consistent "-" markers and indentation for all sub-bullets.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
- Minimum container permission (remove the privileged permission of FUSE Pod)
- **Generic Cache Runtime**
– **Pluggable Architecture**: Standardized Cache Runtime Interface for rapid integration of new engines (CubeFS, Dragonfly, Vineyard) with minimal boilerplate.
– **AdvancedStatefulset-Based Orchestration**: Migrate from StatefulSet to InstanceSet for fine-grained Pod lifecycle management, ordered rollout, and enhanced failover capabilities.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"AdvancedStatefulset-Based Orchestration" likely intends to reference the Kubernetes resource "StatefulSet" (capital S in the second word). Consider renaming to "Advanced StatefulSet-based Orchestration" (or similar) to match the canonical Kubernetes spelling and improve readability.

Suggested change
**AdvancedStatefulset-Based Orchestration**: Migrate from StatefulSet to InstanceSet for fine-grained Pod lifecycle management, ordered rollout, and enhanced failover capabilities.
**Advanced StatefulSet-based Orchestration**: Migrate from StatefulSet to InstanceSet for fine-grained Pod lifecycle management, ordered rollout, and enhanced failover capabilities.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated

– **Hot Parameter Swapping**: Runtime modification of cache engine configurations (e.g., Alluxio block size, Jindo worker threads) for traffic spike handling.

- **API upgradation to v1alpha2**
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"API upgradation" is non-idiomatic English; consider changing this heading to "API upgrade" / "API upgrade to v1alpha2" to improve clarity.

Suggested change
- **API upgradation to v1alpha2**
- **API upgrade to v1alpha2**

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 24 to 28
- **Validation Webhook**

– Admission-time CRD validation with auto-correction suggestions to prevent misconfigurations.

– Policy enforcement for resource quotas and security constraints.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines have inconsistent indentation and use a Unicode en dash ("–") that won't be parsed as a list item by Markdown in many renderers. Using a consistent nested list marker (e.g., two spaces + "- ") will improve readability and ensure bullets render correctly.

Copilot uses AI. Check for mistakes.
ROADMAP.md Outdated
Comment on lines 37 to 38
- **LLM KV Cache Orchestration**_(P0, New Strategic Focus)_

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The emphasis markup is missing a separating space before the italicized priority tag (e.g., "...(P0...)"). Without the space, some Markdown renderers won't format this as intended. Consider adding a space: "... (P0, ...)."

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Fluid roadmap to 2026, introducing significant new objectives and features across the 'Data Anyway', 'Data Anywhere', and 'Data Anytime' sections. The changes provide a clear direction for the project. My review focuses on improving the clarity and consistency of the new roadmap document.

ROADMAP.md Outdated

– **Hot Parameter Swapping**: Runtime modification of cache engine configurations (e.g., Alluxio block size, Jindo worker threads) for traffic spike handling.

- **API upgradation to v1alpha2**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The term "upgradation" is not standard English. For clarity and professionalism in this public-facing document, please use "upgrade" instead.

Suggested change
- **API upgradation to v1alpha2**
- **API upgrade to v1alpha2**

Comment on lines 47 to 49
- **Distributed Prewarming**: Maximize bandwidth utilization for fast data loading.
- **Throttling Control**: Limit bandwidth usage during prewarming to avoid saturation.
- **Rsync Optimization**: Improve cross-region sync efficiency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bullet points for these sub-items use a hyphen (-), which is inconsistent with the en dash () used for other sub-items in this document (e.g., lines 9-10). For consistency, please use an en dash.

Suggested change
- **Distributed Prewarming**: Maximize bandwidth utilization for fast data loading.
- **Throttling Control**: Limit bandwidth usage during prewarming to avoid saturation.
- **Rsync Optimization**: Improve cross-region sync efficiency.
**Distributed Prewarming**: Maximize bandwidth utilization for fast data loading.
**Throttling Control**: Limit bandwidth usage during prewarming to avoid saturation.
**Rsync Optimization**: Improve cross-region sync efficiency.

ROADMAP.md Outdated
- Recommend underutilized Pods for scaling (cost/performance-aware).
- Ensure cache engines adapt to dynamic throughput post-scaling.
- **Cloud-Agnostic Recovery**: Rebuild caches across regions using cloud disk snapshots.
- **Elastic Caching & Cost Optimization**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list item has an extra leading space, causing inconsistent indentation compared to other main points. Please remove it for consistent formatting.

Suggested change
- **Elastic Caching & Cost Optimization**
- **Elastic Caching & Cost Optimization**

ROADMAP.md Outdated
- **Dynamic Volume Mounting**:
- Support dynamic volume mounting capabilities for multi-cloud/hybrid-cloud scenarios.
- Enable dyanmic data mount operations in Python SDK.
• **Temporal Workflow Integration**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list item uses a character, which is inconsistent with the hyphen (-) used for all other main points in the roadmap. Please use a hyphen for consistency.

Suggested change
**Temporal Workflow Integration**
- **Temporal Workflow Integration**

ROADMAP.md Outdated
Comment on lines 87 to 90
- Dataset status inspection and health diagnostics.
- On-demand prewarming triggering (kubectl fluid warmup).
- Cache performance profiling and bottleneck analysis.
- Runtime configuration hot-updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bullet points and indentation for these sub-items are inconsistent with the rest of the document. For consistency, please use an en dash () and adjust the indentation.

Suggested change
- Dataset status inspection and health diagnostics.
- On-demand prewarming triggering (kubectl fluid warmup).
- Cache performance profiling and bottleneck analysis.
- Runtime configuration hot-updates.
Dataset status inspection and health diagnostics.
On-demand prewarming triggering (kubectl fluid warmup).
Cache performance profiling and bottleneck analysis.
Runtime configuration hot-updates.

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.04%. Comparing base (44c3489) to head (a180f6f).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5672      +/-   ##
==========================================
- Coverage   61.05%   61.04%   -0.02%     
==========================================
  Files         444      444              
  Lines       30540    30540              
==========================================
- Hits        18647    18643       -4     
- Misses      10356    10360       +4     
  Partials     1537     1537              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: cheyang <cheyang.cy@alibaba-inc.com>
Signed-off-by: cheyang <cheyang.cy@alibaba-inc.com>
Signed-off-by: cheyang <cheyang.cy@alibaba-inc.com>
@sonarqubecloud
Copy link

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +65 to +71
– **Kueue-Driven Pipelines**: Trigger training/inference jobs automatically upon DataLoad completion; automate post-job cache eviction and data migration.

– **Event-Driven Policies**: Flexible metadata synchronization triggered by workload lifecycle events.

- **Developer Experience**

– **Fluid kubectl Plugin**: Native CLI extension (kubectl fluid) for:
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sub-items under Temporal Workflow Integration / Developer Experience use a Unicode en-dash (–) for the first-level sub-bullets. Replace with standard Markdown list markers so these items render consistently with the - list used below.

Copilot uses AI. Check for mistakes.

- **ThinRuntime Productization**

– Production-ready stability for large-scale deployments with **minimum container privileges** (eliminate privileged FUSE Pod requirements).
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has a header bullet (- **ThinRuntime Productization**) but the following line isn’t formatted as a nested list item (it starts with an en-dash and inconsistent indentation). Convert it into a proper nested bullet so it renders under the ThinRuntime item.

Suggested change
Production-ready stability for large-scale deployments with **minimum container privileges** (eliminate privileged FUSE Pod requirements).
- Production-ready stability for large-scale deployments with **minimum container privileges** (eliminate privileged FUSE Pod requirements).

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +43
– **Disaggregated KV Cache**: Externalize vLLM/SGLang KV Cache to Fluid-managed distributed storage, enabling 10x+ throughput improvement for long-context inference.

– **Cross-Pod Cache Sharing**: Live migration of KV Cache between inference instances for preemptive scheduling and spot instance tolerance.

– **Mooncake Integration**: Official partnership for high-performance KV Cache backend with RDMA acceleration.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KV cache sub-items are prefixed with a Unicode en-dash (–) and inconsistent indentation. This won’t render as a nested list in Markdown; use standard list markers (-/*) and consistent indentation.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
– **Master Pod Crash Recovery**: Automatic re-setup and state reconstruction after cache master failure without data loss.
– **Metadata Persistence**: WAL-based metadata recovery for rapid failover.
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same Markdown list issue here: the JindoRuntime HA sub-items use a Unicode en-dash (–) instead of a Markdown list marker, so they won’t render as bullets. Switch to - and indent as nested bullets.

Suggested change
**Master Pod Crash Recovery**: Automatic re-setup and state reconstruction after cache master failure without data loss.
**Metadata Persistence**: WAL-based metadata recovery for rapid failover.
- **Master Pod Crash Recovery**: Automatic re-setup and state reconstruction after cache master failure without data loss.
- **Metadata Persistence**: WAL-based metadata recovery for rapid failover.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants