Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • M metaseq
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 95
    • Issues 95
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 41
    • Merge requests 41
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Administrator
  • metaseq
  • Issues
  • #285
Closed
Open
Issue created Aug 02, 2022 by Administrator@rootOwner

_future_mask is created on CPU

Created by: KUNAL1612

🐛 Bug

Using profiler to look for CPU intensive operations lead me to observe that _future_mask in Line 725 in transformer.py is being instantiated on GPU.

While this is eventually being moved to GPU in L747, the operations that happen between L728 to L745 happen on CPU since the tensor itself is on CPU.

https://github.com/facebookresearch/metaseq/blob/c9c817d2a230519c2865264bafdf45931afa02e6/metaseq/models/transformer.py#L725-L747

However the code follows that logic only during document attention when the doc separator is not -1. So in effect, creating self._future_mask on GPU in L725 would only offer marginal improvements in some very few specific scenarios.

Need advice on if this change is worth making

Assignee
Assign to
Time tracking