Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Merge requests
  • !51

Finetune only on target labels

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Administrator requested to merge sviyer/labelloss into main May 06, 2022
  • Overview 32
  • Commits 8
  • Pipelines 0
  • Changes 5

Created by: sriniiyer

Description: This diff allows us to backpropagate losses only on designated target tokens rather than all the input tokens. It introduces a new jsonl data format for this, where "src" and "tgt" keys are specified. To use this, use the streaming_finetune_language_modeling task. The task concatenates the src and the tgt, and is trained to produce this autoregressively, but only takes losses on the tgt tokens into account. This is particularly useful during fine-tuning.

Test plan

  1. Does not change existing pre-training i.e. tested existing pre-training on 125M model and ppl for the first 5 updates were exactly the same i.e. unchanged.

  2. Works successfully on src-tgt-format without crashing and performs competitively on copa from superglue.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: sviyer/labelloss