Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !555

Add optional type conversion to FSDP resharding script

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Administrator requested to merge reshard-fsdp into main Dec 21, 2022
  • Overview 3
  • Commits 2
  • Pipelines 0
  • Changes 3

Created by: tangbinh

Summary of Changes

We add an option to convert weights into a new dtype while resharding FSDP checkpoints. This helps reduce checkpoint sizes and avoids issues under RAM constraints when we attempt to load checkpoints. For example, model weights might be saved in full precision but one only needs half precision at inference time.

We also rename some options (e.g. input-glob-pattern → input and output-shard-name → output) for succinctness.

Test Plan

  • Reshard and convert the OPT-125M checkpoint into various dtypes and make sure we can do inference correctly in the new dtypes:
seq 0 1 | parallel --line-buffer 'python metaseq/scripts/reshard_fsdp.py --input "/data/checkpoints/opt-125m/raw/checkpoint_last-model_part-{}-shard*.pt" --output "/data/checkpoints/opt-125m/reshard-no-os/reshard-model_part-{}.pt" --skip-optimizer-state True --unflatten-weights True --output-dtype fp16'
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: reshard-fsdp