Feat/delete intent #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

ianchen0119 merged 5 commits into main from feat/delete-intent

Jan 20, 2026

.github/copilot-instructions.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,116 +1,97 @@
  
    # Copilot Instructions for BSS Metrics API Server

    # Gthulhu API Server - Copilot Instructions

    ## Project Overview

    This is a Go-based API server that bridges Linux kernel scheduler metrics (BSS) with Kubernetes orchestration. The core purpose is to collect scheduling metrics from eBPF programs and provide intelligent scheduling strategies back to the kernel based on Kubernetes pod labels and process patterns.

    ## Architecture Overview

    ## Architecture & Key Components

    This is a **dual-mode Go API server** that bridges Linux kernel scheduling (sched_ext) with Kubernetes:

    ### Core Data Flow

    1. **Metrics Collection**: eBPF programs send BSS scheduler metrics to `/api/v1/metrics`

    2. **Process Discovery**: Server scans `/proc` filesystem to map processes to Kubernetes pods via cgroup parsing

    3. **Strategy Generation**: Combines user-defined strategies with live pod label data to generate scheduling decisions

    4. **Strategy Delivery**: Returns concrete PID-based scheduling strategies to the kernel scheduler

    - **Manager** (`manager/`): Central management service (port 8081) - handles users, RBAC, strategies, and distributes scheduling intents

    - **Decision Maker** (`decisionmaker/`): DaemonSet per-node (port 8080) - receives intents, scans `/proc` for PIDs, and interfaces with eBPF scheduler

    ### Critical File Structure

    - `main.go`: Single-file monolith containing all HTTP handlers and core logic

    - `kubernetes.go`: K8s client with caching layer for pod metadata

    - `config.go`: Configuration with default scheduling strategies

    - `options.go`: CLI argument parsing with dual-mode support (in-cluster vs external)

    Both modes share the same binary, selected via `main.go manager` or `main.go decisionmaker` subcommands.

    ## Development Patterns

    ## Project Structure & Layered Architecture

    Each service follows **Clean Architecture** with strict layer separation:

    ### Data Structure Conventions

    All API responses follow this pattern:

    ```go

    type Response struct {

        Success   bool   `json:"success"`

        Message   string `json:"message"`

        Timestamp string `json:"timestamp"`

        // ... specific data fields

    }

    ```

    {manager,decisionmaker}/

    ├── app/        # Fx modules for DI wiring

    ├── cmd/        # Cobra command definitions

    ├── domain/     # Interfaces, entities, DTOs (Repository, Service, K8SAdapter)

    ├── rest/       # Echo handlers, routes, middleware

    ├── service/    # Business logic implementations

    └── repository/ # MongoDB persistence (manager only)

    ```

    Key interfaces in `manager/domain/interface.go`:

    - `Repository` - data persistence

    - `Service` - business logic

    - `K8SAdapter` - Kubernetes Pod queries via Informer

    - `DecisionMakerAdapter` - sends intents to DM nodes

    ## Development Commands

    ```bash

    # Local infrastructure

    make local-infra-up              # Start MongoDB via docker-compose

    make local-run-manager           # Run Manager locally

    # Testing

    make test-all                    # Run all tests sequentially (required for integration tests)

    go test -v ./manager/rest/...    # Run specific package tests

    ### Scheduling Strategy Resolution

    The system uses a two-phase approach:

    1. **Template Strategies**: Defined with label selectors and regex patterns

    2. **Concrete Strategies**: Resolved to specific PIDs for kernel consumption

    # Mocks & Docs

    make gen-mock                    # Generate mocks via mockery (from domain interfaces)

    make gen-manager-swagger         # Generate Swagger docs

    Example template to concrete transformation:

    # Kind cluster

    make local-kind-setup            # Setup local Kind cluster

    make local-kind-teardown         # Teardown Kind cluster

    ```

    ## Testing Patterns

    Integration tests use **testcontainers** pattern (`pkg/container/`):

    - `HandlerTestSuite` in `manager/rest/handler_test.go` spins up a real MongoDB container

    - Use `app.TestRepoModule()` for container setup with Fx

    - Mock K8S/DM adapters with mockery-generated mocks from `domain/mock_domain.go`

    - Each test cleans DB via `util.MongoCleanup()` in `SetupTest()`

    Example test structure:

    ```go

    // Template (from config/API)

    {

        "selectors": [{"key": "nf", "value": "upf"}],

        "command_regex": "nr-gnb|ping",

        "execution_time": 20000000

    func (suite *HandlerTestSuite) TestSomething() {

        suite.MockK8SAdapter.EXPECT().QueryPods(...).Return(...)

        // Call handler, assert response

    }

    // Becomes multiple concrete strategies

    [

        {"pid": 12345, "execution_time": 20000000},

        {"pid": 12346, "execution_time": 20000000}

    ]

    ```

    ### Kubernetes Integration Patterns

    - **Dual Mode Support**: Always handle both in-cluster (`--in-cluster=true`) and external kubeconfig modes

    - **Graceful Degradation**: If K8s client fails, continue with empty pod labels rather than crashing

    - **Caching Strategy**: 30-second TTL on pod label lookups to reduce API pressure

    - **RBAC Requirements**: Needs `pods` and `namespaces` read access (see `k8s/deployment.yaml`)

    ## Key Conventions

    ### Error Handling Approach

    - Return structured JSON errors, never plain text

    - Log errors but don't expose internal details in API responses

    - Use `log.Printf()` for all logging (no structured logging framework)

    ### Dependency Injection

    - Use **Uber Fx** for DI - see `manager/app/module.go` for module composition

    - Service constructors take `Params struct` with `fx.In` tag

    ## Key Development Workflows

    ### Configuration

    - TOML configs in `config/` with `_config.go` parsers using Viper

    - Sensitive values use `SecretValue` type (masked in logs)

    - Test config: `manager_config.test.toml`

    ### Running & Testing

    ```bash

    # Local development with external K8s

    make run

    # or with specific kubeconfig

    go run main.go --kubeconfig=/path/to/config

    ### REST API

    - **Echo** framework with custom handler wrapper: `h.echoHandler(h.MethodName)`

    - Auth middleware: `h.GetAuthMiddleware(domain.PermissionKey)`

    - Routes in `rest/routes.go`, all versioned under `/api/v1`

    # Testing strategy APIs

    make test-strategies

    ### Error Handling

    - Domain errors in `manager/domain/errors.go` and `manager/errs/errors.go`

    - Use `pkg/errors` for wrapping

    # Container deployment

    make docker-build && make k8s-deploy

    ```

    ### Database

    - MongoDB v2 driver (`go.mongodb.org/mongo-driver/v2`)

    - Migrations in `manager/migration/` (JSON format, run via golang-migrate)

    - Collections: `users`, `roles`, `permissions`, `schedule_strategies`, `schedule_intents`

    ## Important Entities

    ### Adding New Scheduling Logic

    1. Extend `SchedulingStrategy` struct in `main.go`

    2. Update `findPIDsByStrategy()` function for new matching logic

    3. Modify template-to-concrete resolution in `GetSchedulingStrategiesHandler`

    4. Update default config in `config.go`

    ### Process-to-Pod Mapping Logic

    The system parses `/proc/<pid>/cgroup` looking for kubepods patterns:

    - Format: `/kubepods/burstable/pod<uid>/<container-id>`

    - Extracts pod UID, then queries K8s API for labels

    - Critical for linking kernel processes to pod scheduling policies

    ## Integration Points

    ### External Dependencies

    - **Gorilla Mux**: HTTP routing (`github.com/gorilla/mux`)

    - **Kubernetes Client**: Official Go client for pod metadata

    - **Linux /proc filesystem**: Direct parsing for process discovery

    ### API Contract with eBPF Clients

    - BSS metrics use specific field names (e.g., `nr_queued`, `usersched_last_run_at`)

    - Scheduling strategies return nanosecond `execution_time` values

    - All timestamps in RFC3339 format

    ### Deployment Considerations

    - Requires privileged access to `/proc` filesystem

    - Needs K8s RBAC permissions for pod/namespace reads

    - Typically deployed in `kube-system` namespace

    - Health checks on `/health` endpoint

    ## Common Gotchas

    - Pod UIDs from cgroup paths need underscore-to-dash conversion

    - Strategy resolution happens on every GET request (no caching)

    - Kubernetes client initialization is lazy and retried

    - Configuration file is optional; defaults are comprehensive

    - All scheduling times are in nanoseconds, not milliseconds

    - `ScheduleStrategy` - Pod label selectors + scheduling params (priority, execution time)

    - `ScheduleIntent` - Concrete intent for a specific Pod, distributed to DM nodes

    - `User`, `Role`, `Permission` - RBAC model with JWT (RSA) authentication

.github/workflows/ci.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -26,7 +26,7 @@ jobs: @@
           - name: Set up Go
             uses: actions/setup-go@v5
             with:
-              go-version: '1.24'
+              go-version: '1.24.5'
           - name: Download dependencies
             run: go mod download
@@ Expand Down @@

decisionmaker/rest/handler.go

-Original file line number
+Diff line change
@@ Expand Up / @@ -162,6 +162,7 @@ func (h *Handler) SetupRoutes(engine *echo.Echo) error { @@
     		apiV1 := api.Group("/v1")
     		// auth routes
     		apiV1.POST("/intents", h.echoHandler(h.HandleIntents), echo.WrapMiddleware(authMiddleware))
+    		apiV1.DELETE("/intents", h.echoHandler(h.DeleteIntent), echo.WrapMiddleware(authMiddleware))
     		apiV1.GET("/scheduling/strategies", h.echoHandler(h.ListIntents), echo.WrapMiddleware(authMiddleware))
     		apiV1.POST("/metrics", h.echoHandler(h.UpdateMetrics), echo.WrapMiddleware(authMiddleware))
     		// token routes
@@ Expand Down @@

decisionmaker/rest/intent_handler.go

-Original file line number
+Diff line change
@@ Expand Up @@
     	}
     	return labelSelectors
     }
+    type DeleteIntentRequest struct {
+    	PodID string `json:"podId,omitempty"` // If provided, deletes all intents for this pod
+    	PID   *int   `json:"pid,omitempty"`   // If provided with PodID, deletes specific intent
+    	All   bool   `json:"all,omitempty"`   // If true, deletes all intents
+    }
+    func (h *Handler) DeleteIntent(w http.ResponseWriter, r *http.Request) {
+    	ctx := r.Context()
+    	var req DeleteIntentRequest
+    	err := h.JSONBind(r, &req)
+    	if err != nil {
+    		h.ErrorResponse(ctx, w, http.StatusBadRequest, "Invalid request payload", err)
+    		return
+    	}
+    	if req.All {
+    		err = h.Service.DeleteAllIntents(ctx)
+    		if err != nil {
+    			h.ErrorResponse(ctx, w, http.StatusInternalServerError, "Failed to delete all intents", err)
+    			return
+    		}
+    		h.JSONResponse(ctx, w, http.StatusOK, NewSuccessResponse[EmptyResponse](nil))
+    		return
+    	}
+    	if req.PodID == "" {
+    		h.ErrorResponse(ctx, w, http.StatusBadRequest, "PodID is required when 'all' is false", nil)
+    		return
+    	}
+    	if req.PID != nil {
+    		err = h.Service.DeleteIntentByPID(ctx, req.PodID, *req.PID)
+    	} else {
+    		err = h.Service.DeleteIntentByPodID(ctx, req.PodID)
+    	}
+    	if err != nil {
+    		h.ErrorResponse(ctx, w, http.StatusInternalServerError, "Failed to delete intent", err)
+    		return
+    	}
+    	h.JSONResponse(ctx, w, http.StatusOK, NewSuccessResponse[EmptyResponse](nil))
+    }

decisionmaker/service/service.go

-Original file line number
+Diff line change
@@ Expand Up @@
     func (svc *Service) UpdateMetrics(ctx context.Context, newMetricSet *domain.MetricSet) {
     	svc.metricCollector.UpdateMetrics(newMetricSet)
     }
+    // DeleteIntentByPodID deletes all scheduling intents for a specific pod ID
+    func (svc *Service) DeleteIntentByPodID(ctx context.Context, podID string) error {
+    	keysToDelete := []string{}
+    	svc.schedulingIntentsMap.Range(func(key string, value []*domain.SchedulingIntents) bool {
+    		if strings.HasPrefix(key, podID+"-") {
+    			keysToDelete = append(keysToDelete, key)
+    		}
+    		return true
+    	})
+    	for _, key := range keysToDelete {
+    		svc.schedulingIntentsMap.Delete(key)
+    	}
+    	logger.Logger(ctx).Info().Msgf("Deleted %d scheduling intents for pod ID: %s", len(keysToDelete), podID)
+    	return nil
+    }
+    // DeleteIntentByPID deletes a specific scheduling intent by pod ID and PID
+    func (svc *Service) DeleteIntentByPID(ctx context.Context, podID string, pid int) error {
+    	key := fmt.Sprintf("%s-%d", podID, pid)
+    	svc.schedulingIntentsMap.Delete(key)
+    	logger.Logger(ctx).Info().Msgf("Deleted scheduling intent for key: %s", key)
+    	return nil
+    }
+    // DeleteAllIntents clears all scheduling intents
+    func (svc *Service) DeleteAllIntents(ctx context.Context) error {
+    	keysToDelete := []string{}
+    	svc.schedulingIntentsMap.Range(func(key string, value []*domain.SchedulingIntents) bool {
+    		keysToDelete = append(keysToDelete, key)
+    		return true
+    	})
+    	for _, key := range keysToDelete {
+    		svc.schedulingIntentsMap.Delete(key)
+    	}
+    	logger.Logger(ctx).Info().Msgf("Deleted all %d scheduling intents", len(keysToDelete))
+    	return nil
+    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/delete intent #10

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!