Google's open-source software TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models to production, use TFX to create and manage a production pipeline.
A TFX pipeline is a sequence of components that implement an ML pipeline which is specifically designed for scalable, high-performance machine learning tasks. Components are built using TFX libraries which can also be used individually.
TFX pipeline templates make it easy to get started with pipeline development by providing a prebuilt set of pipeline definitions that you can customize for your use case.
In this tutorial, we mainly follow the instructions provided by this excellent TensorFlow tutorial: Building a TFX Pipeline Locally to build a TFX pipeline.
To complete this tutorial, you need the following environment:
model: The name of the template you want to copy (we use the taxi template).
pipeline_name: The name of the pipeline to create (we call it pipeline-taxi).
destination_path: The path to copy the template into (you need to provide this information).
On my machine, the code is: tfx template copy --model=taxi --pipeline_name=pipeline-taxi \ --destination_path=/Users/jankirenz/tfx-taxi
A copy of the pipeline template has been created at the path you specified.
Explore the directories and files that were copied to your pipeline's project directory tfx-taxi:
A pipeline directory with
pipeline.py - defines the pipeline, and lists which components are being used.
configs.py - holds configuration details such as where the data is coming from or which orchestrator is being used
A data directory
This typically contains a data.csv file, which is the default source for the TFX-component ExampleGen. You can change the data source in configs.py.
A models directory with preprocessing code and model implementations
The template copies directed acyclic graph (DAG) runners –which runs the components one by one in DAG's topological order– for local environment and Kubeflow. The file is called local.runner.py.
Before we can create our pipeline, we first need to change some code in the file local_runner.py. This script creates a pipeline run and specifies the run's parameters, such as the DATA_PATH and OUTPUT_DIR.
Note that you don't necessarily have to change the variable definition of
DATA_PATH
since the given expression returns the full path name in a multiplatform-safe way.
Open the file with your code editor and define the variables OUTPUT_DIR (in line 32) and DATA_PATH:
OUTPUT_DIR = 'your-path-to-tfx-taxi/output'
On my machine, the variable would be defined as: OUTPUT_DIR = /Users/jankirenz/tfx-taxi/output
DATA_PATH = 'your-path-to-tfx-taxi/data/'
On my machine, the variable would be defined as: DATA_PATH = '/Users/jankirenz/tfx-taxi/data/'
We can save all changes and close the file.
In your terminal, change directory (cd) into the project directory of tfx-taxi:
cd your-path-to-txf-taxi
On my machine: cd /Users/jankirenz/tfx-taxi/
Run the following commands in your pipeline directory:
In my case: tfx pipeline update --pipeline_path=/Users/jankirenz/tfx-taxi/local_runner.py
The last line of your output should display: Pipeline "pipeline-taxi" updated successfully.
Run the pipeline:
tfx run create --pipeline_name=pipeline-taxi
In your output, the last line reads INFO:absl:Component Transform is finished.
Take a look at your files in the output folder. TFX stored multiple artifacts for every component in the pipeline.
Congratulations! You have completed the tutorial and learned how to:
✅ Install a TFX pipeline template ✅ Created a local TFX pipeline run ✅ Added pipeline components
Thank you for participating in this tutorial. If you found any issues along the way I'd appreciate it if you'd raise them by clicking the "Report a mistake" button at the bottom left of this site.