Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !350

restore last checkpoint from the end of the training runs

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge lastchkptfix into main Sep 26, 2022
  • Overview 2
  • Commits 1
  • Pipelines 0
  • Changes 3

Created by: ruanslv

After https://github.com/facebookresearch/metaseq/commit/e3ea5070a8c1bae77703aef7fc0f5537bd437963 we stopped storing checkpoints at the end of the runs. Let's bring them back.

I'm repurposing last_.* checkpoints to be only the ones corresponding to the end of training. In practice, with previous code they were never stored because "epoch" or "updates" one would take precedence. Now, if it's the end of the run and we are not at the end of an epoch or a saving interval, we store the checkpoint using "last_" naming (assuming cfg flag is enabled).

To test: Trained a model and checked that last checkpoint was stored in Azure.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: lastchkptfix