Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • B bull
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 175
    • Issues 175
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OptimalBits
  • bull
  • Merge requests
  • !2326

perf: speed up clean operation

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge github/fork/emcsween/clean-speedup into develop Mar 18, 2022
  • Overview 9
  • Commits 1
  • Pipelines 0
  • Changes 2

Created by: emcsween

Hi there,

Thanks for this very useful queue library. We've been using it successfully for a while and it's been very robust. However, we recently managed to freeze our Redis server when we tried to clean around 1 million jobs from a wait queue using the clean operation. This PR is a suggestion for improving the speed of the clean operation so that it takes seconds instead of hours on queues with millions of jobs.

The clean operation on sets backed by lists (wait, active, paused) quickly gets very slow when the list is large. This is because each job deletion scans the whole list in a LREM call, resulting in O(N * M) complexity where N is the number of jobs in the list and M is the number of jobs to delete.

With this change, the deletion is done in two passes. The first pass sets each item that should be deleted to a special value. The second pass deletes all items with that special value in a single LREM call. This results in O(N) complexity.

Benchmarks

I ran some (not super accurate) benchmarks on my laptop to show the effect when cleaning either all jobs or 1000 jobs in queues of different sizes:

Queue size Time to clean 1000 jobs - before after Time to clean all jobs - before after
1K 27 ms 10 ms 27 ms 16 ms
10K 331 ms 11 ms 1.7 s 100 ms
100K 3.7 s 14 ms 3 minutes 900 ms
1M 42.1 s 50 ms didn't measure - would take hours 11.7 s

My benchmark script, for reference:

import Queue from 'bull'

const queue = new Queue('test')

await queue.empty()

for (const msgCount of [1000, 10000, 100000, 1000000]) {
  console.log(`Queue size: ${msgCount}`)

  const jobs = new Array(msgCount).fill({ some: 'data' })
  console.time('add')
  await queue.addBulk(jobs)
  console.timeEnd('add')

  console.time('clean')
  await queue.clean(0, 'wait')
  console.timeEnd('clean')

  console.log()
}

queue.close()

Alternative implementation

Figuring out the index for the LSET that sets the deletion marker is a bit tricky because we traverse the list in batches from the end. This reverse traversal was introduced in #2205 to speed up the clean operation when a limit is given. The script could be slightly simplified if we traversed the list in batches from the beginning. I'd be happy to have a go at it if that makes more sense.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/emcsween/clean-speedup