Skip to content

Conversation

@winterhazel
Copy link
Member

Description

The listEvents and deleteEvents APIs execute some non-optimized queries, which results in very long response times in environments that have many events.

Some optimizations were performed in these two workflows in order to reduce response times. The listing now uses a covering index, and does not perform a unecessary DISTINCT anymore. The deletion is now performed in batches. The batch size is defined by the global setting delete.query.batch.size.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

Events listing

In a test environment with 2 million event entries, I called listEvents multiple times passing different combinations of account, archived, domainid, duration, enddate, entrytime, id, keyword, level, projectid, resourceid, resourcetype, startdate, startid and type. For almost all the listings, I got a more reasonable response time (between 0.1 and 3 seconds depending on the filters) compared to before (30+ seconds). The only exception is the keyword parameter, which is not optimizable with a index because it performs a column LIKE "%keyword%" for 3 different columns.

Access control

I called listEvents and verified that:

  • root admins can list all events;
  • domain admins can only list events from their domain;
  • users can only list their events.

Events deletion

  1. I assigned a value to delete.query.batch.size, enabling batch delete;
  2. I called deleteEvents to remove all events before 2025-01-01. Then, I verified that they were removed successfully;
  3. I assigned a value to event.purge.delay, enabling the automatic deletion of events older than 15 days. After the deletion task executed, I verified that events older than 15 days were removed successfully.
Access control

I called deleteEvents and verified that:

  • root admins can delete any events;
  • domain admins can only delete events from their domain;
  • users can only delete their events.

@winterhazel
Copy link
Member Author

@blueorangutan package

@winterhazel winterhazel added this to the 4.20.2 milestone Aug 13, 2025
@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 40.16393% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (34b8870) to head (e803bdb).

Files with missing lines Patch % Lines
...rc/main/java/com/cloud/event/dao/EventDaoImpl.java 0.00% 43 Missing ⚠️
...in/java/com/cloud/server/ManagementServerImpl.java 76.19% 4 Missing and 6 partials ⚠️
...stack/api/command/user/event/ArchiveEventsCmd.java 57.14% 6 Missing ⚠️
...ava/com/cloud/upgrade/dao/Upgrade42210to42300.java 0.00% 6 Missing ⚠️
...dstack/api/command/user/event/DeleteEventsCmd.java 61.53% 5 Missing ⚠️
...ain/java/com/cloud/upgrade/dao/DbUpgradeUtils.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11443      +/-   ##
============================================
+ Coverage     17.60%   17.61%   +0.01%     
- Complexity    15668    15675       +7     
============================================
  Files          5915     5915              
  Lines        529963   529988      +25     
  Branches      64734    64731       -3     
============================================
+ Hits          93300    93367      +67     
+ Misses       426194   426148      -46     
- Partials      10469    10473       +4     
Flag Coverage Δ
uitests 3.58% <ø> (ø)
unittests 18.68% <40.16%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14617

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the performance of events listing and deletion operations to reduce response times in environments with large numbers of events. The optimizations include removing unnecessary DISTINCT operations, adding a covering index for better query performance, and implementing batch deletion.

  • Replaced DISTINCT with NATIVE function in event listing queries for better performance
  • Implemented batch deletion using the existing delete.query.batch.size configuration
  • Added a new covering index on the event table to optimize search queries

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ManagementServerImpl.java Refactored deleteEvents method to use batch deletion and simplified access control logic
ConfigurationManagerImpl.java Updated documentation for DELETE_QUERY_BATCH_SIZE to include events
QueryManagerImpl.java Removed DISTINCT operation from event search query for performance
EventDaoImpl.java Added new purgeAll method for batch deletion and removed unused listOlderEvents method
EventDao.java Added purgeAll method interface and removed listOlderEvents method
DomainDaoImpl.java Added getDomainAndChildrenIds convenience method
DomainDao.java Added getDomainAndChildrenIds method interface
DeleteEventsCmd.java Moved parameter validation from command to service layer and improved error message
ListEventsCmd.java Removed unnecessary blank line
Upgrade42010to42020.java Added database upgrade class to create covering index
DbUpgradeUtils.java Added utility method for creating indexes with custom names
schema-42010to42020.sql Empty schema upgrade file
schema-42010to42020-cleanup.sql Empty schema cleanup file

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14637

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good, will make the delete events action much better. Needs testing.

@winterhazel could we extend this to cover archive events as well? Also, I’m not aware of any other DB changes slated for 4.20.2. If this is the only one, would it be safer to target 4.22? Not blocking—just a suggestion.

@winterhazel
Copy link
Member Author

@winterhazel could we extend this to cover archive events as well? Also, I’m not aware of any other DB changes slated for 4.20.2. If this is the only one, would it be safer to target 4.22? Not blocking—just a suggestion.

@shwstppr I'll have a look into whether event archiving needs or would benefit from any changes.

About targeting 4.22 instead: the only DB change here is the addition of an index. This can not cause any inconsistency as far as I know, so it should be safe going to 4.20.2.

@weizhouapache weizhouapache modified the milestones: 4.20.2, 4.22.0 Sep 11, 2025
@weizhouapache
Copy link
Member

Moving to 4.22 milestone as it has some DB changes
cc @winterhazel @harikrishna-patnala

@winterhazel winterhazel marked this pull request as draft September 23, 2025 21:46
@harikrishna-patnala harikrishna-patnala modified the milestones: 4.22.0, 4.22.1 Nov 12, 2025
@winterhazel winterhazel force-pushed the listevents-optimizations branch from 0a8e293 to 73f79b8 Compare December 28, 2025 17:26
@winterhazel winterhazel changed the base branch from 4.20 to main December 28, 2025 17:26
@winterhazel winterhazel marked this pull request as ready for review December 28, 2025 17:27
@winterhazel winterhazel force-pushed the listevents-optimizations branch from 73f79b8 to f52a619 Compare December 28, 2025 17:30
@winterhazel
Copy link
Member Author

@winterhazel could we extend this to cover archive events as well?

I did some refactoring and extended the event archival to support batch operations as well. The batch size uses delete.query.batch.size for now, but I think we can rename this setting to a more generic one in another PR.

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@winterhazel winterhazel changed the title Events listing and deletion optimizations Events listing, deletion and archival optimizations Dec 28, 2025
@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 16177

@winterhazel winterhazel modified the milestones: 4.22.1, 4.23.0 Dec 28, 2025
@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 16178

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16183

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants