MLflow

MLflow

MLflow is an open source platform for managing the end-to-end machine learning lifecycle.

It is provided by Databricks and therefore fits naturally in the Databricks workspace but can also be used in other solutions. It provides simple APIs for

  • logging metrics (for example, model loss),

  • parameters (for example, learning rate), and

  • fitted models,

making it easy to analyze training results or deploy models later on.

Introduction to MLflow from “Databricks Data + AI World Tour” by Clemens Mewald:

Components

With MLflow Tracking you can log parameters, code versions, metrics, and output files when running your machine learning code. You just need to insert some logging functions to start logging. MLflow also provides APIs to automatically log training code written in many ML frameworks. You can call this API before running training code to log model-specific metrics, parameters, and model artifacts.

An MLflow Project is a format for packaging data science code in a reusable and reproducible way. It is just a convention for organizing and describing your code to let other data scientists (or automated tools) run it. You can get more control over an MLflow Project by adding an MLproject file, which is a text file in YAML syntax, to the project’s root directory.

MLflow Models: An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.

MLflow Model Registry is a centralized model repository and a UI and a set of APIs. Model Registry provides chronological model lineage (which MLflow experiment and run produced the model at a given time), model versioning, and stage transitions (for example, from staging to production or archived). You can also create and view model descriptions and leave comments.

Finally, MLflow Model Serving allows you to host machine learning models from Model Registry as REST endpoints that are updated automatically based on the availability of model versions and their stages (only available inside Databricks).

Questions

  • What are the main components of MLflow?

Tutorial

This tutorial showcases how you can use MLflow end-to-end to:

  • Train a linear regression model

  • Package the code that trains the model in a reusable and reproducible model format

  • Deploy the model into a simple HTTP server that will enable you to score predictions

  • Tutorial from MLflow