Google's open-source software TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models to production, use TFX to create and manage a production pipeline.
A TFX pipeline is a sequence of components that implement an ML pipeline which is specifically designed for scalable, high-performance machine learning tasks. Components are built using TFX libraries which can also be used individually.
TFX pipeline templates make it easy to get started with pipeline development by providing a prebuilt set of pipeline definitions that you can customize for your use case.
In our example, we mainly follow the instructions provided by this TensorFlow tutorial: Building a TFX Pipeline Locally to build a pipeline from a prebuilt template.
To complete this tutorial, you need the following environment:
model: The name of the template you want to copy (we use the penguin template).
pipeline_name: The name of the pipeline to create (we call it pipeline-tutorial).
destination_path: The path to copy the template into (you need to provide this information).
On my machine, the code is: tfx template copy --model=penguin --pipeline_name=pipeline-tutorial \ --destination_path=/Users/jankirenz/tfx-files
A copy of the pipeline template has been created at the path you specified.
Explore the directories and files that were copied to your pipeline's project directory tfx-files:
A pipeline directory with
pipeline.py - defines the pipeline, and lists which components are being used
configs.py - holds configuration details such as where the data is coming from or which orchestrator is being used
A data directory
This typically contains a data.csv file, which is the default source for the TFX-component ExampleGen. You can change the data source in configs.py.
A models directory with preprocessing code and model implementations
The template copies directed acyclic graph (DAG) runners –which runs the components one by one in DAG's topological order– for local environment and Kubeflow. The file is called local.runner.py.
Before we can create our pipeline, we first need to change some code in the file local_runner.py. This script creates a pipeline run and specifies the run's parameters, such as the DATA_PATH and OUTPUT_DIR.
Note that you don't necessarily have to change the variable definition of
DATA_PATH
since the given expression returns the full path name in a multiplatform-safe way.
Open the file with your code editor and define the variables OUTPUT_DIR (in line 32) and DATA_PATH:
OUTPUT_DIR = 'your-path-to-tfx-files/output'
On my machine, the variable would be defined as: OUTPUT_DIR = /Users/jankirenz/tfx-files/output
DATA_PATH = 'your-path-to-tfx-files/data/'
On my machine, the variable would be defined as: DATA_PATH = '/Users/jankirenz/tfx-files/data/'
We can save all changes and close the file.
In your terminal, change directory (cd) into the project directory of tfx-files:
cd your-path-to-txf-files
On my machine: cd /Users/jankirenz/tfx-files/
Run the following commands in your pipeline directory:
In my case this would be tfx pipeline create --pipeline_path=/Users/jankirenz/tfx-files/local_runner.py
If you run the code, the last output line should display Pipeline "pipeline-tutorial" created successfully.
Finally, run this command:
tfx run create --pipeline_name=pipeline-tutorial
The command creates a pipeline run using LocalDagRunner, which adds the following directories to your pipeline (in your sub-folder output:
A tfx_metadata directory which contains the ML Metadata store used locally.
A tfx_pipeline_output directory which contains the pipeline's file outputs.
Open your pipeline's pipeline/configs.py file and review the contents:
This script defines the configuration options used by the pipeline and the component functions.
This is where you would specify things like the location of the datasource or the number of training steps in a run.
Open your pipeline's pipeline/pipeline.py file and review the contents:
This script creates the TFX pipeline.
Initially, the pipeline contains only an ExampleGen component.
You may follow the instructions in the TODO comments in pipeline.py to add more steps to the pipeline.
Gongratulations! You have completed the tutorial and learned how to:
✅ Install a TFX pipeline template ✅ Created a local TFX pipeline run ✅ Reviewed the pipeline components
Thank you for participating in this tutorial. If you found any issues along the way I'd appreciate it if you'd raise them by clicking the "Report a mistake" button at the bottom left of this site.