Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !331

[dist] Force threads=1 when spawning.

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge nospawn into main Sep 06, 2022
  • Overview 3
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: stephenroller

Patch Description Some of our consolidation logic is extremely slow: this is due to the concat's happening on CPU, and the default in pytorch is to use OMP threads to parallelize this. However, we already have spawned multiple processes (or slurm did), so all the threads end up fighting and thrashing for the CPUs.

We could do this individually in every script if we wanted to be careful about optimizing for CPU cases, but given that we basically only ever do GPU logic, we can just bake this assumption right into our distributed code.

Testing steps Before change, loading checkpoints pegged all CPUs to 100%. After change, they do not and it runs faster.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: nospawn