Skip to content

[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON#48250

Merged
xinlian12 merged 6 commits intoAzure:mainfrom
xinlian12:reduceCLIArgs
Mar 5, 2026
Merged

[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON#48250
xinlian12 merged 6 commits intoAzure:mainfrom
xinlian12:reduceCLIArgs

Conversation

@xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Mar 4, 2026

Summary

Refactors the azure-cosmos-benchmark configuration system to eliminate CLI parameter sprawl, centralize all workload configuration in a single JSON file (-workloadConfig), and unify all benchmark types under a single BenchmarkOrchestrator.

Before: ~60 CLI parameters in Configuration.java, 5 separate code paths in Main.java.
After: 6 CLI params remain, single orchestrator entry point for all benchmark types.


Architecture: Before vs After

Before

Main.main()
  ├─ if (isSync)         → syncBenchmark()       [own MetricRegistry, reporter, result uploader]
  ├─ if (CtlWorkload)    → asyncCtlWorkload()    [own MetricRegistry, reporter]
  ├─ if (LinkedInCtl)    → linkedInCtlWorkload()  [own MetricRegistry, reporter]
  ├─ if (encryption)     → asyncEncryptionBenchmark() [own MetricRegistry, reporter]
  └─ else                → BenchmarkOrchestrator  [shared MetricRegistry, reporter]

After

Main.main()
  └─ BenchmarkOrchestrator.run(benchConfig)
       ├─ Sets up shared infrastructure (MetricRegistry, reporters, system properties)
       ├─ Factory dispatches based on operationType + flags (isSync, isEncryptionEnabled):
       │   ├─ Sync:       SyncReadBenchmark / SyncWriteBenchmark
       │   ├─ CTL:        AsyncCtlWorkload
       │   ├─ LinkedIn:   LICtlWorkload
       │   ├─ Encryption: AsyncEncryptionRead/Write/Query benchmarks
       │   └─ Async:      AsyncRead/Write/Query/Mixed/ReadMany benchmarks
       └─ Lifecycle loop: create → run → shutdown → settle × N cycles

All benchmark types now implement a common Benchmark interface (run() + shutdown()) and accept an injected MetricRegistry from the orchestrator.


Workload Config JSON Structure

{
  "tenantDefaults": {
    "connectionMode": "DIRECT",
    "consistencyLevel": "SESSION",
    "concurrency": "50",
    "numberOfOperations": "100000",
    "operation": "ReadThroughput",
    "numberOfPreCreatedDocuments": "1000",
    "isPartitionLevelCircuitBreakerEnabled": "true",
    "isPerPartitionAutomaticFailoverRequired": "true",
    "minConnectionPoolSizePerEndpoint": "0"
  },
  "metrics": {
    "enableJvmStats": true,
    "enableNettyHttpMetrics": false,
    "printingInterval": 10,
    "reportingDirectory": "/tmp/benchmark-output",
    "resultUpload": {
      "serviceEndpoint": "https://results-account.documents.azure.com:443/",
      "masterKey": "",
      "database": "benchresults",
      "container": "runs"
    },
    "runMetadata": {
      "testVariationName": "baseline-direct-mode",
      "branchName": "main",
      "commitId": "abc123def"
    }
  },
  "tenants": [
    {
      "id": "tenant-0",
      "serviceEndpoint": "https://cosmos-account-0.documents.azure.com:443/",
      "masterKey": "",
      "databaseId": "benchdb",
      "containerId": "benchcol",
      "operation": "ReadThroughput",
      "concurrency": 100
    }
  ]
}

How defaults work:

  • Each tenant inherits from tenantDefaults (e.g., both tenants get connectionMode: DIRECT)
  • Per-tenant values override defaults (e.g., tenant-0 gets concurrency: 100)
  • metrics section is global (not per-tenant)
  • JVM system properties (isPartitionLevelCircuitBreakerEnabled, etc.) live in tenantDefaults but apply globally

All Configurations — Before vs After

Account & Connection

Config Name Before After
serviceEndpoint CLI -serviceEndpoint tenants[] / tenantDefaults
masterKey CLI -masterKey tenants[] / tenantDefaults
databaseId CLI -databaseId tenants[] / tenantDefaults
containerId CLI -collectionId tenants[] / tenantDefaults
connectionMode CLI -connectionMode tenants[] / tenantDefaults
consistencyLevel CLI -consistencyLevel tenants[] / tenantDefaults
maxConnectionPoolSize CLI -maxConnectionPoolSize tenants[] / tenantDefaults
preferredRegionsList CLI -preferredRegionsList tenants[] / tenantDefaults
applicationName CLI -applicationName tenants[] / tenantDefaults
manageDatabase CLI -manageDatabase tenants[] / tenantDefaults

Workload Parameters

Config Name Before After
operation CLI -operation tenants[] / tenantDefaults
concurrency CLI -concurrency tenants[] / tenantDefaults
numberOfOperations CLI -numberOfOperations tenants[] / tenantDefaults
useSync CLI -useSync tenants[] / tenantDefaults
encryptionEnabled CLI -encryptionEnabled tenants[] / tenantDefaults

JVM System Properties

Config Name Before After
isPartitionLevelCircuitBreakerEnabled CLI tenantDefaults only
isPerPartitionAutomaticFailoverRequired CLI tenantDefaults only
minConnectionPoolSizePerEndpoint CLI tenantDefaults only

Metrics & Reporting

Config Name Before After
enableJvmStats CLI -enableJvmStats metrics
printingInterval CLI -printingInterval metrics
reportingDirectory CLI -reportingDirectory metrics
Result upload fields CLI flags metrics.resultUpload
Run metadata fields CLI flags metrics.runMetadata

Remaining CLI Parameters (unchanged)

Config Name Description
-workloadConfig Path to workload config JSON (required)
-cycles Number of create/destroy cycles
-settleTimeMs Wait between cycles
-gcBetweenCycles Force GC during settle
-suppressCleanup Skip DB/container cleanup
-help Show help

Key Changes

  1. Main.java: 190 → 62 lines. Single code path through BenchmarkOrchestrator
  2. Benchmark.java (new): Common interface (run() + shutdown()) implemented by all benchmark types
  3. BenchmarkOrchestrator.java: Expanded factory dispatches sync, CTL, encryption, LinkedIn benchmarks based on operationType + flags
  4. SyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, LICtlWorkload: Accept injected MetricRegistry, removed self-managed reporters/JVM stats/result uploaders
  5. BenchmarkConfig.java: Separated config parsing into focused methods (loadJvmSystemProperties, loadMetricsConfig, loadResultUploadConfig, loadRunMetadata)
  6. Configuration.java: 1057 → 131 lines. Only CLI lifecycle params remain
  7. TenantWorkloadConfig.java: All workload params with tenantDefaults merge
  8. Renamed: globalDefaultstenantDefaults, -tenantsFile-workloadConfig
  9. Removed: Graphite reporting, redundant TenantWorkloadConfigFromConfigurationTest.java

Verification

mvn compile -pl sdk/cosmos/azure-cosmos-benchmark -DskipTests passes cleanly

Annie Liang and others added 2 commits March 4, 2026 12:51
…d config JSON

- Remove ~50 CLI parameters from Configuration.java; all workload/connection
  params now come from the workload config JSON file (TenantWorkloadConfig)
- Configuration.java reduced from 1057 to 131 lines with only 6 CLI params:
  workloadConfig, cycles, settleTimeMs, gcBetweenCycles, suppressCleanup, help
- Move metrics/reporting/result-upload configs into a nested 'metrics' section
  in the workload config JSON with 'resultUpload' and 'runMetadata' sub-objects
- Remove all Graphite reporting support (use Application Insights instead)
- Remove TenantWorkloadConfig.fromConfiguration() factory method
- Make workload config file mandatory (-workloadConfig flag)
- Rename -tenantsFile CLI flag to -workloadConfig
- Update all consumer classes (29 files) to use TenantWorkloadConfig for
  workload params and BenchmarkConfig for global/reporting params

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added the Cosmos label Mar 4, 2026
Annie Liang and others added 2 commits March 4, 2026 14:50
Route sync, CTL, encryption, and LinkedIn benchmarks through the
orchestrator instead of having separate code paths in Main.java.

- Add Benchmark interface (run + shutdown) implemented by all benchmark types
- Refactor SyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and
  LICtlWorkload to accept injected MetricRegistry (like AsyncBenchmark)
- Remove self-managed reporter, result uploader, and JVM stats from each
  benchmark — orchestrator handles all infrastructure concerns
- Expand orchestrator factory to dispatch based on operationType + flags
  (isSync, isEncryptionEnabled)
- Simplify Main.java from 5 code paths to a single orchestrator call

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The section provides default values for tenant-level configuration,
so tenantDefaults better reflects its purpose.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Split loadGlobalSystemPropertiesFromWorkloadConfig into 4 focused methods:
  loadJvmSystemProperties, loadMetricsConfig, loadResultUploadConfig, loadRunMetadata
- Add clarifying comment in Main.java explaining Configuration -> BenchmarkConfig
  -> BenchmarkOrchestrator flow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 marked this pull request as ready for review March 4, 2026 23:22
@xinlian12 xinlian12 requested review from a team and kirankumarkolli as code owners March 4, 2026 23:22
Copilot AI review requested due to automatic review settings March 4, 2026 23:22
@xinlian12 xinlian12 requested a review from a team as a code owner March 4, 2026 23:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the azure-cosmos-benchmark tool to eliminate ~60 CLI parameters and centralize all workload configuration in a single JSON file (-workloadConfig). The result is a unified BenchmarkOrchestrator entry point for all benchmark types (sync, async, CTL, encryption, LinkedIn), with a new Benchmark interface and per-benchmark MetricRegistry injection from the orchestrator.

Changes:

  • Config system: Configuration.java reduced from ~1057 to 131 lines; all workload params moved to TenantWorkloadConfig and loaded from JSON; BenchmarkConfig now only loads from JSON (no more CLI single-tenant path).
  • Benchmark unification: New Benchmark interface (run() + shutdown()) implemented by all benchmark types; BenchmarkOrchestrator now dispatches sync, CTL, encryption, and LinkedIn benchmarks in addition to async ones.
  • Test updates: Tests migrated from building CLI strings + JCommander parsing to direct TenantWorkloadConfig construction; TenantWorkloadConfigFromConfigurationTest deleted.

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Configuration.java Reduced to 6 CLI lifecycle params only
TenantWorkloadConfig.java All workload params added; fromConfiguration() factory removed; parseWorkloadConfig renamed from parseTenantsFile
BenchmarkConfig.java Now always requires a JSON config file; split loading into focused methods
BenchmarkOrchestrator.java Factory now dispatches all benchmark types via Benchmark interface; buildCosmosMicrometerRegistry inlined
Benchmark.java New interface with run() and shutdown()
Main.java Simplified to single code path through BenchmarkOrchestrator
SyncBenchmark.java, SyncReadBenchmark.java, SyncWriteBenchmark.java Accept injected MetricRegistry; implement Benchmark
AsyncBenchmark.java Implements Benchmark; run()/shutdown() made public
AsyncEncryptionBenchmark.java Accept injected MetricRegistry; implements Benchmark
AsyncCtlWorkload.java Accept injected MetricRegistry; implements Benchmark
LICtlWorkload.java Accept injected MetricRegistry; implements Benchmark; setup() merged into run()
ScheduledReporterFactory.java Accepts BenchmarkConfig instead of Configuration
WorkflowTest.java Tests migrated to direct TenantWorkloadConfig construction
ReadMyWritesConsistencyTest.java Test migrated to direct TenantWorkloadConfig construction
TenantWorkloadConfigFromConfigurationTest.java Deleted (no longer needed)
LinkedIn classes Configuration replaced with TenantWorkloadConfig
Comments suppressed due to low confidence (3)

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java:219

  • In AsyncEncryptionBenchmark.shutdown(), the suppressCleanup flag is not respected. Unlike AsyncBenchmark, SyncBenchmark, and AsyncCtlWorkload which all check workloadConfig.isSuppressCleanup() before deleting database/container, AsyncEncryptionBenchmark.shutdown() unconditionally deletes them when databaseCreated or collectionCreated is true. This means the encryption benchmark will always clean up even when cycles > 1 and suppressCleanup=true is set.
    public void shutdown() {
        if (this.databaseCreated) {
            cosmosAsyncDatabase.delete().block();
            logger.info("Deleted temporary database {} created for this test", this.workloadConfig.getDatabaseId());
        } else if (this.collectionCreated) {
            cosmosAsyncContainer.delete().block();
            logger.info("Deleted temporary collection {} created for this test", this.workloadConfig.getContainerId());
        }

        cosmosClient.close();
    }

sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ScheduledReporterFactory.java:36

  • The ScheduledReporterFactory.create() method signature changed from accepting Configuration to accepting BenchmarkConfig, but the method is no longer called anywhere (reporter creation is now inlined in BenchmarkOrchestrator.run()). The ScheduledReporterFactory class appears to be dead code after this refactor. Verify whether it is still used, and if not, consider removing it to avoid confusion.
    public static ScheduledReporter create(final BenchmarkConfig benchConfig,
        final MetricRegistry metricsRegistry) {
        if (benchConfig.getReportingDirectory() != null) {
            return CsvReporter.forRegistry(metricsRegistry)
                .convertDurationsTo(TimeUnit.MILLISECONDS)
                .convertRatesTo(TimeUnit.SECONDS)
                .build(new File(benchConfig.getReportingDirectory()));
        } else {
            return ConsoleReporter.forRegistry(metricsRegistry)
                .convertDurationsTo(TimeUnit.MILLISECONDS)
                .convertRatesTo(TimeUnit.SECONDS)
                .build();
        }

sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java:99

  • The readMyWritesCLI and writeLatencyCLI tests (lines 33–46 and 86–99) call Main.main(StringUtils.split(cmd)) with the old CLI parameters (-serviceEndpoint, -masterKey, -databaseId, -collectionId, -operation, etc.). After this refactor, Main.main() no longer accepts those parameters — it requires -workloadConfig pointing to a JSON file. These tests will fail at runtime with a ParameterException from JCommander (unknown parameters) followed by a NullPointerException in BenchmarkConfig.fromConfiguration() because workloadConfig will be null. These tests need to be either updated to write a temporary JSON config file and pass -workloadConfig to them, or removed if they are superseded by the readMyWrites/writeLatency test variants below them.

You can also share your feedback on Copilot code review. Take the survey.

- Validate all tenants, not just the first one
- Guard against empty tenants list with clear error message
- Fix copy-paste bug: roleInstance null check used roleName instead
- Move Environment enum from Configuration to TenantWorkloadConfig
  to remove coupling to CLI-only class

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks. Only NIT: can you add some sampel worklaod config in the project - just to make it easier when we create our own?

@xinlian12 xinlian12 merged commit f196372 into Azure:main Mar 5, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants