Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A awesome-python
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 13
    • Issues 13
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 317
    • Merge requests 317
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Vinta Chen
  • awesome-python
  • Merge requests
  • !1748

Adding hub to the awesome list

  • Review changes

  • Download
  • Email patches
  • Plain diff
Open Administrator requested to merge github/fork/sparkingdark/adding-hubpackage into master Apr 01, 2021
  • Overview 1
  • Commits 1
  • Pipelines 0
  • Changes 1

Created by: sparkingdark

What is this Python project?

Hub - Fastest unstructured dataset management for TensorFlow/PyTorch by activeloop.ai. Stream & version-control data. Converts large data into a single numpy-like array on the cloud, accessible on any machine.

Describe features.

  • Store and retrieve large datasets with version-control
  • Collaborate as in Google Docs: Multiple data scientists working on the same data in sync with no interruptions
  • Access from multiple machines simultaneously
  • Deploy anywhere - locally, on Google Cloud, S3, Azure, and Activeloop (by default - and for free!)
  • Integrate with your ML tools like Numpy, Dask, Ray, PyTorch, or TensorFlow
  • Create arrays as big as you want. You can store images as big as 100k by 100k!
  • Keep the shape of each sample dynamic. This way you can store small and big arrays as 1 array.
  • Visualize any slice of the data in a matter of seconds without redundant manipulations

What's the difference between this Python project and similar ones?

Enumerate comparisons.

It's much more deep learning, machine learning-oriented, and makes easy handling of the data.

Anyone who agrees with this pull request could submit an Approve review to it.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: github/fork/sparkingdark/adding-hubpackage