Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • P pyod
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 144
    • Issues 144
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 16
    • Merge requests 16
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Yue Zhao
  • pyod
  • Merge requests
  • !467

Implementing ECDF Estimator and deleting Statsmodels dependency

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Lucew requested to merge github/fork/Lucew/development into development Dec 21, 2022
  • Overview 6
  • Commits 3
  • Pipelines 0
  • Changes 11

Hey everyone,

as stated in #466 and in #453, one can speed up the empirical cumulative density function in comparison to the Statsmodels ECDF functionality.

This also makes the dependency on statsmodels obsolete and this pull request deletes the dependency.

In this pull request the following things are done:

  1. Implementing an standalone ecdf estimator in pyod/utils/stat_models.py
  2. Writing a test that compares own implementation to statsmodels implementation on several random matrices (so in the requirements_ci.txt statsmodels is still a requirement)
  3. Deleting and replacing the functionality in ECOD and COPOD (the only places this dependency has been used

The implementation is now faster (by 30-60%), as we will only use the ecdf for the data we estimate it from. Please get back to me if a further explanation of why exactly is necessary. I will gladly elaborate more.

Since not anyone might want to fully submerge in the topic, I kept the statsmodels dependency in the test and compare this implementation to the statsmodels function on several random matrices. One could see that as prove that it works.

Thanks in advance! :-)


All Submissions Basics:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Have you checked all Issues to tie the PR to a specific one?

All Submissions Cores:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?
  • Does your submission pass tests, including CircleCI, Travis CI, and AppVeyor?
  • Does your submission have appropriate code coverage? The cutoff threshold is 95% by Coversall.
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/Lucew/development