What we cover

TFX

In this tutorial, we mainly follow the instructions provided by this excellent TensorFlow tutorial: Building a TFX Pipeline Locally to build a TFX pipeline.

To complete this tutorial, you need the following environment:

Furthermore, you should be aware of pathing differences between macOS, Windows and Linux:

We first need to set up our environment and create new folders:

  1. On Windows open the Start menu and open an Anaconda Command Prompt. On macOS or Linux open a terminal window.
  2. Activate the virtual Anaconda environment (in my case "tf"):
conda activate tf
  1. Create a project folder tfx-taxi with mkdir (make directory):
mkdir tfx-taxi
  1. Create a sub-folder inside tfx-taxi called output. Therefore, we first need to change directory (cd) into the tfx-taxi directory:
cd your-path-to-txf-taxi
  1. Create the new sub-folder output:
mkdir output

Create a copy of the pipeline template

In the next steps, we use the TFX command-line interface (CLI) which is a part of the TFX package. All commands start with tfx.

First, we use tfx template which are commands for listing and copying TFX pipeline templates.

  1. List the currently available TFX pipeline templates:
tfx template list
  1. We copy the taxi template to our local machine (you have to change the following code):
tfx template copy --model=taxi --pipeline_name=pipeline-taxi \
--destination_path=your-path-to-txf-taxi

Only change the entry for destination_path:

  1. A copy of the pipeline template has been created at the path you specified.

Explore the directories and files that were copied to your pipeline's project directory tfx-taxi:

Before we can create our pipeline, we first need to change some code in the file local_runner.py. This script creates a pipeline run and specifies the run's parameters, such as the DATA_PATH and OUTPUT_DIR.

Note that you don't necessarily have to change the variable definition of

DATA_PATH

since the given expression returns the full path name in a multiplatform-safe way.

  1. Open the file with your code editor and define the variables OUTPUT_DIR (in line 32) and DATA_PATH:
OUTPUT_DIR = 'your-path-to-tfx-taxi/output'
DATA_PATH = 'your-path-to-tfx-taxi/data/'
  1. We can save all changes and close the file.
  2. In your terminal, change directory (cd) into the project directory of tfx-taxi:
cd your-path-to-txf-taxi
  1. Run the following commands in your pipeline directory:
tfx pipeline create --pipeline_path=your-path-to-txf-taxi/local_runner.py

In my case this would be tfx pipeline create --pipeline_path=/Users/jankirenz/tfx-taxi/local_runner.py

If you run the code, the last output line should display Pipeline "pipeline-taxi" created successfully.

  1. Finally, use this command to actually run the pipeline:
tfx run create --pipeline_name=pipeline-taxi

Open your pipeline's pipeline/configs.py file and review the contents:

Open your pipeline's pipeline/pipeline.py file to add some TFX components to our pipeline:

  1. Update the pipeline: run the following command in your pipeline directory:
tfx pipeline update --pipeline_path=your-path-to-txf-taxi/local_runner.py
  1. Run the pipeline:
tfx run create --pipeline_name=pipeline-taxi
  1. Take a look at your files in the output folder. TFX stored multiple artifacts for every component in the pipeline.

Congratulations! You have completed the tutorial and learned how to:

✅ Install a TFX pipeline template
✅ Created a local TFX pipeline run
✅ Added pipeline components

Jan Kirenz

Thank you for participating in this tutorial. If you found any issues along the way I'd appreciate it if you'd raise them by clicking the "Report a mistake" button at the bottom left of this site.

Copyright: Jan Kirenz (2021) | kirenz.com | CC BY-NC 2.0 License