Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • B bull
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 175
    • Issues 175
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OptimalBits
  • bull
  • Issues
  • #2015
Closed
Open
Issue created Apr 21, 2021 by Administrator@rootContributor

[Bug] Delayed jobs get stuck in delayed queue if Redis is busy

Created by: swayam18

Description

When Redis is "busy running a script" the jobs in a queue get stuck in the delayed state unless a new job is added to the queue or the process is restarted. Upon investigation, the culprit is the following line:

https://github.com/OptimalBits/bull/blob/edfbd163991c212dd6548875c22f2745f897ae28/lib/queue.js#L899

If this command ever fails, the recursion breaks and updateDelayTimer is not called again till a new delayed job is added. Since that may never happen, jobs may get permanently stuck in the delayed queue.

Here is the sequence of events that lead to this scenario:

  1. Redis is busy running a heavy script (for eg: queue.clean was run to clear failed jobs)

  2. During this time, a call to updateDelayTimer is made, which in turn calls the updateDelaySet command: https://github.com/OptimalBits/bull/blob/edfbd163991c212dd6548875c22f2745f897ae28/lib/queue.js#L897 https://github.com/OptimalBits/bull/blob/edfbd163991c212dd6548875c22f2745f897ae28/lib/queue.js#L899

  3. The updateDelaySet command fails with the following error: ReplyError: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE. as Redis is busy.

  4. The promise fails and the catch block simply emits an error: https://github.com/OptimalBits/bull/blob/edfbd163991c212dd6548875c22f2745f897ae28/lib/queue.js#L932

Now because of the failure, the updateDelayTimer function is never called after this point, leading to the delayed jobs being stuck. The only way to recover them is by adding another delayed job to the queue, which seemingly triggers the message handler to call updateDelayTimer and restart the recursive process.

Proposed Solution

I am not 100% sure if this makes sense, but adding this line of code seems to have fix the problem:

    .catch(err => {
      setTimeout(() => this.updateDelayTimer(), 1000); // <- this line
      this.emit('error', err, 'Error updating the delay timer');
    });

Essentially, we retry the updateDelayTimer after a constant delay and hope that Redis is no longer busy and can now run the updateDelaySet command.

Not 100% sure if this can cause more than one this.updateDelayTimer loop to be active, will need your feedback for this.

Minimal, Working Test code to reproduce the issue.

Let me know if this is necessary and I will create a repo with the necessary code

Bull version

3.22.1

Additional information

Assignee
Assign to
Time tracking