Skip to content

Linear backoff and retry logic for SQL errors #5

@JasonDV

Description

@JasonDV

MergeQueryObject is a fairly efficient way of merging changes into a large table, but the simplest merge can take a few minutes to complete, and it executes as a transaction because it is a single SQL statement.

Let's say we are merging 1 million records into a table with 100 million records. This may take several minutes to complete, after the initial transfer to the source table, prior to the merge.

We may want to break this merge into separate actions distributed across mutliple worker nodes. Each merge is still executed serial against the database tables, so it is possible that merges timeout or error in some other way.

We don't want to give up on the merge if all that is happening is that we are waiting for the table to become free because another merge is executing.

  • Add logic that will detect a SQL timeout or deadlock and retry.
  • The retry should be an linear backout and retry.
  • Update the command timeout option on the MergeRequest to give it a normal 3 to 5 minute value, instead of no value right now.
  • Have the wait time between attempts grow with each failure
  • Add an option for the number of retries to the MergeRequest object

Example
Merge starts
after 5 minutes the command timesout
Merge retries after 2 minutes wait
after 5 minutes the command timesout
Merge retries after 4 minute wait
after 5 minutes the command timesout
Merge retires after 6 minute wait
merge completes because the Merge statement blocking the target table took 10 minutes to execute due to extreme activity on the server.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions