Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • B buck
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 201
    • Issues 201
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 22
    • Merge requests 22
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Meta
  • buck
  • Merge requests
  • !2660

make allocators and sanitizers work for processes created with multiprocessing's spawn method in dev mode

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Yifu Wang requested to merge github/fork/yifuwang/export-D31106794-to-dev into dev Sep 27, 2021
  • Overview 6
  • Commits 1
  • Pipelines 1
  • Changes 5

Summary: The first attempt (D30802446) overlooked that fact that the interpreter wrapper is not an executable (for execv) on Mac, and introduced some bugs due to the refactoring. The attempt 2 addressed the issues, and isolated the effect of the change to only processes created by multiprocess's spawn method on Linux.

Problem

Currently, the entrypoint for in-place Python binaries (i.e. built with dev mode) executes the following steps to load system native dependencies (e.g. sanitizers and allocators):

  • Backup LD_PRELOAD set by the caller
  • Append system native dependencies to LD_PRELOAD
  • Inject a prologue in user code which restores LD_PRELOAD set by the caller
  • execv Python interpreter

The steps work as intended for single process Python programs. However, when a Python program spawns child processes, the child processes will not load native dependencies, since they simply execv's the vanilla Python interpreter. A few examples why this is problematic:

  • The ASAN runtime library is a system native dependency. Without loading it, a child process that loads user native dependencies compiled with ASAN will crash during static initialization because it can't find _asan_init.
  • jemalloc is also a system native dependency.

Many if not most ML use cases "bans" dev mode because of these problems. It is very unfortunate considering the developer efficiency dev mode provides. In addition, a huge amount of unit tests have to run in a more expensive build mode because of these problems.

For an earlier discussion, see this post.

Solution

Move the system native dependencies loading logic out of the Python binary entrypoint into an interpreter wrapper, and set the interpreter as sys.executable in the injected prologue:

  • The Python binary entrypoint now uses the interpreter wrapper, which has the same command line interface as the Python interpreter, to run the main module.
  • multiprocessing's spawn method now uses the interpreter wrapper to create child processes, ensuring system native dependencies get loaded correctly.

Alternative Considered

One alternative considered is to simply not removing system native dependencies from LD_PRELOAD, so they are present in the spawned processes. However, this can cause some linking issues, which were perhaps the reason LD_PRELOAD was restored in the first place.

References

An old RFC for this change: D16210828 The counterpart for opt mode: D16350169

Reviewed By: fried

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/yifuwang/export-D31106794-to-dev