Add MLflow. (!1378) · Merge requests · Vinta Chen / awesome-python

Closed Administrator requested to merge github/fork/jmrr/master into master Oct 06, 2019

Created by: jmrr

What is this Python project?

MLflow is an open source platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models.

MLflow is the most comprehensive, platform agnostic project with the aims of encompassing, on a single platform, three main components of the ML lifecycle:

MLflow Tracking: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI.
MLflow Projects: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others.
MLflow Models: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker.

What's the difference between this Python project and similar ones?

MLOps is still a domain in its early stages but some tools already exist based on the Kubernetes containerised ecosystem:

The fact that they're based on Kubernetes appears to be somewhat of a barrier for small scale Data Science teams, whilst with MLflow an individual contributor can easily setup a single tracking server for their own experiments. They also tend to be more Deep Learning oriented. An advantage of Pachyderm is that it provides data reproducibility (apart from the code + model reproducibility provided by MLflow).

Sacred provides experimentation logging, but doesn't provide model packaging and sharing or the possibility of creating reproducible projects with your ML code for other people to use. Also you'd need a frontend (see next entry) to visualise and track your experiments, which is already provided by MLflow tracking server.
Ombniboard would only provide the frontend.

Some other nice tools exist but they're library specific, e.g. to track specific frameworks' simulations: TensorBoard and in the domain of model deployment TFX for TensorFlow.

Anyone who agrees with this pull request could vote for it by adding a 👍 to it, and usually, the maintainer will merge it when votes reach 20.