[Azure-cosmos-benchmark]Refactor azure-cosmos-benchmark: move all configs from CLI to workload config JSON#48250
Conversation
…d config JSON - Remove ~50 CLI parameters from Configuration.java; all workload/connection params now come from the workload config JSON file (TenantWorkloadConfig) - Configuration.java reduced from 1057 to 131 lines with only 6 CLI params: workloadConfig, cycles, settleTimeMs, gcBetweenCycles, suppressCleanup, help - Move metrics/reporting/result-upload configs into a nested 'metrics' section in the workload config JSON with 'resultUpload' and 'runMetadata' sub-objects - Remove all Graphite reporting support (use Application Insights instead) - Remove TenantWorkloadConfig.fromConfiguration() factory method - Make workload config file mandatory (-workloadConfig flag) - Rename -tenantsFile CLI flag to -workloadConfig - Update all consumer classes (29 files) to use TenantWorkloadConfig for workload params and BenchmarkConfig for global/reporting params Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route sync, CTL, encryption, and LinkedIn benchmarks through the orchestrator instead of having separate code paths in Main.java. - Add Benchmark interface (run + shutdown) implemented by all benchmark types - Refactor SyncBenchmark, AsyncCtlWorkload, AsyncEncryptionBenchmark, and LICtlWorkload to accept injected MetricRegistry (like AsyncBenchmark) - Remove self-managed reporter, result uploader, and JVM stats from each benchmark — orchestrator handles all infrastructure concerns - Expand orchestrator factory to dispatch based on operationType + flags (isSync, isEncryptionEnabled) - Simplify Main.java from 5 code paths to a single orchestrator call Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The section provides default values for tenant-level configuration, so tenantDefaults better reflects its purpose. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkConfig.java
Outdated
Show resolved
Hide resolved
- Split loadGlobalSystemPropertiesFromWorkloadConfig into 4 focused methods: loadJvmSystemProperties, loadMetricsConfig, loadResultUploadConfig, loadRunMetadata - Add clarifying comment in Main.java explaining Configuration -> BenchmarkConfig -> BenchmarkOrchestrator flow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the azure-cosmos-benchmark tool to eliminate ~60 CLI parameters and centralize all workload configuration in a single JSON file (-workloadConfig). The result is a unified BenchmarkOrchestrator entry point for all benchmark types (sync, async, CTL, encryption, LinkedIn), with a new Benchmark interface and per-benchmark MetricRegistry injection from the orchestrator.
Changes:
- Config system:
Configuration.javareduced from ~1057 to 131 lines; all workload params moved toTenantWorkloadConfigand loaded from JSON;BenchmarkConfignow only loads from JSON (no more CLI single-tenant path). - Benchmark unification: New
Benchmarkinterface (run()+shutdown()) implemented by all benchmark types;BenchmarkOrchestratornow dispatches sync, CTL, encryption, and LinkedIn benchmarks in addition to async ones. - Test updates: Tests migrated from building CLI strings +
JCommanderparsing to directTenantWorkloadConfigconstruction;TenantWorkloadConfigFromConfigurationTestdeleted.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
Configuration.java |
Reduced to 6 CLI lifecycle params only |
TenantWorkloadConfig.java |
All workload params added; fromConfiguration() factory removed; parseWorkloadConfig renamed from parseTenantsFile |
BenchmarkConfig.java |
Now always requires a JSON config file; split loading into focused methods |
BenchmarkOrchestrator.java |
Factory now dispatches all benchmark types via Benchmark interface; buildCosmosMicrometerRegistry inlined |
Benchmark.java |
New interface with run() and shutdown() |
Main.java |
Simplified to single code path through BenchmarkOrchestrator |
SyncBenchmark.java, SyncReadBenchmark.java, SyncWriteBenchmark.java |
Accept injected MetricRegistry; implement Benchmark |
AsyncBenchmark.java |
Implements Benchmark; run()/shutdown() made public |
AsyncEncryptionBenchmark.java |
Accept injected MetricRegistry; implements Benchmark |
AsyncCtlWorkload.java |
Accept injected MetricRegistry; implements Benchmark |
LICtlWorkload.java |
Accept injected MetricRegistry; implements Benchmark; setup() merged into run() |
ScheduledReporterFactory.java |
Accepts BenchmarkConfig instead of Configuration |
WorkflowTest.java |
Tests migrated to direct TenantWorkloadConfig construction |
ReadMyWritesConsistencyTest.java |
Test migrated to direct TenantWorkloadConfig construction |
TenantWorkloadConfigFromConfigurationTest.java |
Deleted (no longer needed) |
| LinkedIn classes | Configuration replaced with TenantWorkloadConfig |
Comments suppressed due to low confidence (3)
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/encryption/AsyncEncryptionBenchmark.java:219
- In
AsyncEncryptionBenchmark.shutdown(), thesuppressCleanupflag is not respected. UnlikeAsyncBenchmark,SyncBenchmark, andAsyncCtlWorkloadwhich all checkworkloadConfig.isSuppressCleanup()before deleting database/container,AsyncEncryptionBenchmark.shutdown()unconditionally deletes them whendatabaseCreatedorcollectionCreatedis true. This means the encryption benchmark will always clean up even when cycles > 1 andsuppressCleanup=trueis set.
public void shutdown() {
if (this.databaseCreated) {
cosmosAsyncDatabase.delete().block();
logger.info("Deleted temporary database {} created for this test", this.workloadConfig.getDatabaseId());
} else if (this.collectionCreated) {
cosmosAsyncContainer.delete().block();
logger.info("Deleted temporary collection {} created for this test", this.workloadConfig.getContainerId());
}
cosmosClient.close();
}
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/ScheduledReporterFactory.java:36
- The
ScheduledReporterFactory.create()method signature changed from acceptingConfigurationto acceptingBenchmarkConfig, but the method is no longer called anywhere (reporter creation is now inlined inBenchmarkOrchestrator.run()). TheScheduledReporterFactoryclass appears to be dead code after this refactor. Verify whether it is still used, and if not, consider removing it to avoid confusion.
public static ScheduledReporter create(final BenchmarkConfig benchConfig,
final MetricRegistry metricsRegistry) {
if (benchConfig.getReportingDirectory() != null) {
return CsvReporter.forRegistry(metricsRegistry)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.convertRatesTo(TimeUnit.SECONDS)
.build(new File(benchConfig.getReportingDirectory()));
} else {
return ConsoleReporter.forRegistry(metricsRegistry)
.convertDurationsTo(TimeUnit.MILLISECONDS)
.convertRatesTo(TimeUnit.SECONDS)
.build();
}
sdk/cosmos/azure-cosmos-benchmark/src/test/java/com/azure/cosmos/benchmark/WorkflowTest.java:99
- The
readMyWritesCLIandwriteLatencyCLItests (lines 33–46 and 86–99) callMain.main(StringUtils.split(cmd))with the old CLI parameters (-serviceEndpoint,-masterKey,-databaseId,-collectionId,-operation, etc.). After this refactor,Main.main()no longer accepts those parameters — it requires-workloadConfigpointing to a JSON file. These tests will fail at runtime with aParameterExceptionfrom JCommander (unknown parameters) followed by aNullPointerExceptioninBenchmarkConfig.fromConfiguration()becauseworkloadConfigwill be null. These tests need to be either updated to write a temporary JSON config file and pass-workloadConfigto them, or removed if they are superseded by thereadMyWrites/writeLatencytest variants below them.
You can also share your feedback on Copilot code review. Take the survey.
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/Main.java
Outdated
Show resolved
Hide resolved
...s/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/BenchmarkOrchestrator.java
Outdated
Show resolved
Hide resolved
...os/azure-cosmos-benchmark/src/main/java/com/azure/cosmos/benchmark/TenantWorkloadConfig.java
Outdated
Show resolved
Hide resolved
- Validate all tenants, not just the first one - Guard against empty tenants list with clear error message - Fix copy-paste bug: roleInstance null check used roleName instead - Move Environment enum from Configuration to TenantWorkloadConfig to remove coupling to CLI-only class Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM - Thanks. Only NIT: can you add some sampel worklaod config in the project - just to make it easier when we create our own?
Summary
Refactors the azure-cosmos-benchmark configuration system to eliminate CLI parameter sprawl, centralize all workload configuration in a single JSON file (
-workloadConfig), and unify all benchmark types under a singleBenchmarkOrchestrator.Before: ~60 CLI parameters in
Configuration.java, 5 separate code paths inMain.java.After: 6 CLI params remain, single orchestrator entry point for all benchmark types.
Architecture: Before vs After
Before
After
All benchmark types now implement a common
Benchmarkinterface (run()+shutdown()) and accept an injectedMetricRegistryfrom the orchestrator.Workload Config JSON Structure
{ "tenantDefaults": { "connectionMode": "DIRECT", "consistencyLevel": "SESSION", "concurrency": "50", "numberOfOperations": "100000", "operation": "ReadThroughput", "numberOfPreCreatedDocuments": "1000", "isPartitionLevelCircuitBreakerEnabled": "true", "isPerPartitionAutomaticFailoverRequired": "true", "minConnectionPoolSizePerEndpoint": "0" }, "metrics": { "enableJvmStats": true, "enableNettyHttpMetrics": false, "printingInterval": 10, "reportingDirectory": "/tmp/benchmark-output", "resultUpload": { "serviceEndpoint": "https://results-account.documents.azure.com:443/", "masterKey": "", "database": "benchresults", "container": "runs" }, "runMetadata": { "testVariationName": "baseline-direct-mode", "branchName": "main", "commitId": "abc123def" } }, "tenants": [ { "id": "tenant-0", "serviceEndpoint": "https://cosmos-account-0.documents.azure.com:443/", "masterKey": "", "databaseId": "benchdb", "containerId": "benchcol", "operation": "ReadThroughput", "concurrency": 100 } ] }How defaults work:
tenantDefaults(e.g., both tenants getconnectionMode: DIRECT)concurrency: 100)metricssection is global (not per-tenant)isPartitionLevelCircuitBreakerEnabled, etc.) live intenantDefaultsbut apply globallyAll Configurations — Before vs After
Account & Connection
serviceEndpoint-serviceEndpointtenants[]/tenantDefaultsmasterKey-masterKeytenants[]/tenantDefaultsdatabaseId-databaseIdtenants[]/tenantDefaultscontainerId-collectionIdtenants[]/tenantDefaultsconnectionMode-connectionModetenants[]/tenantDefaultsconsistencyLevel-consistencyLeveltenants[]/tenantDefaultsmaxConnectionPoolSize-maxConnectionPoolSizetenants[]/tenantDefaultspreferredRegionsList-preferredRegionsListtenants[]/tenantDefaultsapplicationName-applicationNametenants[]/tenantDefaultsmanageDatabase-manageDatabasetenants[]/tenantDefaultsWorkload Parameters
operation-operationtenants[]/tenantDefaultsconcurrency-concurrencytenants[]/tenantDefaultsnumberOfOperations-numberOfOperationstenants[]/tenantDefaultsuseSync-useSynctenants[]/tenantDefaultsencryptionEnabled-encryptionEnabledtenants[]/tenantDefaultsJVM System Properties
isPartitionLevelCircuitBreakerEnabledtenantDefaultsonlyisPerPartitionAutomaticFailoverRequiredtenantDefaultsonlyminConnectionPoolSizePerEndpointtenantDefaultsonlyMetrics & Reporting
enableJvmStats-enableJvmStatsmetricsprintingInterval-printingIntervalmetricsreportingDirectory-reportingDirectorymetricsmetrics.resultUploadmetrics.runMetadataRemaining CLI Parameters (unchanged)
-workloadConfig-cycles-settleTimeMs-gcBetweenCycles-suppressCleanup-helpKey Changes
BenchmarkOrchestratorrun()+shutdown()) implemented by all benchmark typesoperationType+ flagsMetricRegistry, removed self-managed reporters/JVM stats/result uploadersloadJvmSystemProperties,loadMetricsConfig,loadResultUploadConfig,loadRunMetadata)tenantDefaultsmergeglobalDefaults→tenantDefaults,-tenantsFile→-workloadConfigTenantWorkloadConfigFromConfigurationTest.javaVerification
mvn compile -pl sdk/cosmos/azure-cosmos-benchmark -DskipTestspasses cleanly