Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !254

[bf16] Don't convert weights to fp16 during load

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge nofp16load into main Jul 25, 2022
  • Overview 2
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: stephenroller

Patch Description There's a known bug where training on bf16 and then requeuing will result in the model temporarily being converted to fp16, causing potential noise issues. This avoids this conversion.

This likely explains why some users are seeing some loss explosions after they requeue.

Testing steps Launched the API and manually checked the code path taken.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: nofp16load