Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Merge requests
  • !378

Cooks distance branch

  • Review changes

  • Download
  • Email patches
  • Plain diff
Closed KulikDM requested to merge github/fork/KulikDM/Cooks_Distance_Branch into development Mar 22, 2022
  • Overview 0
  • Commits 5
  • Pipelines 0
  • Changes 3

Cook's distance outlier detector

A supervised regression outlier detector

Cook's distance can be used to identify points that negatively affect a regression model. A combination of each observation’s leverage and residual values are used in the measurement. Higher leverage and residuals relate to higher Cook’s distances. Read more in the :cite:cook1977outlier --> https://www.jstor.org/stable/1268249

The script cd.py has been added to pyod/models/ containing the Cook's distance outlier detector. The code is mostly based off what has been implemented in the Yellowbrick repo but thought it would be nice to be able to call it with all the others outlier detectors in Pyod.

An example and test script has now also been added as well as the original Cook's distance outlier detector script being simplified. However, due to the way that the Cook's distance is calculated, the target variable y is necessary for both the train and test data. The decision function has been rewritten to take X as an appended array of [X,y] (see example script). Still, because is this fit not the test script fit_predict and fit_predict_score will not run without issues. But I see that both these functions are depreciated anyway so I hope this is not a deal breaker since the results from this outlier detector are relatively good. If you think that it should be written that both for fit and decision_function should have only X as an input, I can rewrite that but the user will have to append the X and y data prior to running either call.

Hopefully this will become an useful addition to the already great repo and python package that Pyod is.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/KulikDM/Cooks_Distance_Branch