Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Merge requests
  • !197

Parallelize the fit and decision_function methods of FeatureBagging

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Shihab Shahriar Khan requested to merge github/fork/Shihab-Shahriar/parallel_feat_bag into development May 24, 2020
  • Overview 4
  • Commits 6
  • Pipelines 1
  • Changes 2

This PR Parallelize the fit and decision_function methods of FeatureBagging. The earlier implementation only used the n_jobs when base_estimator parameter is None. Apart from fixing that, the model level PR enables parallelism at more coarser level, thereby noticeably improving performance.

Benchmark results using n_estimators=20 and base_estimator=None, averaged over 3 runs. Values indicate fit time in seconds, the one inside bracket denote time for decision_function:

Dataset (shape) Orig (n_jobs=1) Orig (n_jobs=4) This PR (n_jobs=4)
pima (768, 8) 0.19 (0.094) 2.30 (2.155) 0.64 (0.63)
vowels (1456, 12) 0.71 (0.42) 2.36 (2.17) 0.66 (0.64)
pendigits (6870, 16) 9.12 (5.02) 5.87 (4.32) 1.78 (1.42)
musk (3062, 166) 18.92 (8.32) 7.46 (5.88) 3.90 (2.79)
shuttle (49097, 9) 59.09 (38.67) 46.10 (28.11) 33.43 (18.01)

Performance can be slightly worse than single-process method for smaller datasets, but I think that is expected.

Please let me know if further changes are needed. Thanks.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/Shihab-Shahriar/parallel_feat_bag