Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • B bull
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 175
    • Issues 175
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OptimalBits
  • bull
  • Issues
  • #1600
Closed
Open
Issue created Dec 31, 2019 by Administrator@rootContributor

completed and failed jobs should be removed from the stalled set

Created by: tomgrossman

Description

When a job moves to failed or completed status, it should be removed from the stalled set if it's in there. If a process took longer than expected, it make sense it will be considered as stalled because it was unlocked. But if the worker didn't crash and just got stuck in one of the sub-processes, the job will finally be completed or failed, so it means it's not really stalled. In order to avoid re-running of the same job, you can check if the job is in the stalled set and remove it from there.

I know this is the current design and I can adjust the settings of stalledInterval and maxStalledCount. But if it can be avoided easily in the way I described, why not?

Worst than that, let's say the job was completed and cleaned, finally the stalled job will be returned to wait, but the job data is already deleted, so the worker will crash due to missing data of the job. This is also can be avoided by the suggested fix.

Bull version

3.11.0

Assignee
Assign to
Time tracking