Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !450

fixed issue 437 and removed retired profiler

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge new_profiler into main Oct 25, 2022
  • Overview 4
  • Commits 5
  • Pipelines 0
  • Changes 5

Created by: igormolybogFB

Patch Description the flag --profile in metaseq train.py was not accessible through opt_baseline.py or sweep_baseline.py that we normally use. This is due to --profile flag in those was mapped to --new_profiler flag in train.py and not the --profile flag.

Here is the description of how that is happening (How profiler option gets into slurm job):

  1. [sweep_baseline.py] –profile is read from cli
  2. –new-profile is added to the grid (profile is not)
  3. [slurm.py] config is produced from the grid
  4. train_cmd is extended from config
  5. srun_cmd is extended from train_cmd (and srun_cmd_str)
  6. srun_cmd_str -> wrapped_cmd -> sbatch_cmd run_batch (sbatch_cmd) is called

Moreover, --profile flag corresponds to the outdated version of profiler (torch.autograd.profiler) and not the new one (torch.profiler). As per @ngoyal2707 request, the outdated profiler gets cleaned out of our code (in metaseq only) and --profile in both metaseq-internal and metaseq are being matched.

Besides that, issue 437 is fixed by implementing the suggestion

Testing steps run

python -m metaseq_internal.projects.zucchini.sweep_baseline -g 2 -n 1 --azure --model-size 125m --data /data/gpt-z/zucchini/consolidated/v1.0 --tokenizer noregex --partition zetta --prefix profile_run --profile

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: new_profiler