Merge branch 'tensorflow:master' into master

tensorflow · Nov 15, 2024 · 479b089 · 479b089
2 parents f28d6a4 + be1477f
commit 479b089
Show file tree

Hide file tree

Showing 248 changed files with 2,283 additions and 33,576 deletions.
diff --git a/.github/workflows/cd-docs.yml b/.github/workflows/cd-docs.yml
@@ -3,7 +3,8 @@ on:
   workflow_dispatch:
   push:
     branches:
-      - master
+      - 'master'
+  pull_request:
 permissions:
   contents: write
 jobs:
@@ -17,6 +18,7 @@ jobs:
         run: |
           git config user.name github-actions[bot]
           git config user.email 41898282+github-actions[bot]@users.noreply.github.com
+        if: (github.event_name != 'pull_request')
 
       - name: Set up Python 3.9
         uses: actions/setup-python@v5
@@ -26,6 +28,7 @@ jobs:
           cache-dependency-path: |
             setup.py
             tfx/dependencies.py
+            requirements-docs.txt
 
       - name: Save time for cache for mkdocs
         run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
@@ -39,7 +42,12 @@ jobs:
             mkdocs-material-
 
       - name: Install Dependencies
-        run: pip install mkdocs mkdocs-material mkdocstrings[python] griffe-inherited-docstrings mkdocs-autorefs mkdocs-jupyter mkdocs-caption markdown-grid-tables
+        run: pip install -r requirements-docs.txt
 
       - name: Deploy to GitHub Pages
         run: mkdocs gh-deploy --force
+        if: (github.event_name != 'pull_request')
+
+      - name: Build docs to check for errors
+        run: mkdocs build
+        if: (github.event_name == 'pull_request')
diff --git a/.github/workflows/ci-test.yml b/.github/workflows/ci-test.yml
@@ -2,6 +2,7 @@
 
 name: tfx-unit-tests
 on:
+  push:
   pull_request:
     branches: [ master ]
     paths-ignore:
@@ -52,7 +53,10 @@ jobs:
         python -m pip install --upgrade pip wheel
         # TODO(b/232490018): Cython need to be installed separately to build pycocotools.
         python -m pip install Cython -c ./test_constraints.txt
-        pip install -c ./test_constraints.txt --extra-index-url https://pypi-nightly.tensorflow.org/simple --pre .[all]
+        pip install \
+          -c ./${{ matrix.dependency-selector == 'NIGHTLY' && 'nightly_test_constraints.txt' || 'test_constraints.txt' }} \
+          --extra-index-url https://pypi-nightly.tensorflow.org/simple --pre .[all]
+
       env:
         TFX_DEPENDENCY_SELECTOR: ${{ matrix.dependency-selector }}
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -254,3 +254,57 @@ reviewer.
 For public PRs which do not have a preassigned reviewer, a TFX engineer will
 monitor them and perform initial triage within 5 business days. But such
 contributions should be trivial (i.e, documentation fixes).
+
+## Continuous Integration
+
+This project makes use of CI for
+
+- Building the `tfx` python package when releases are made
+- Running tests
+- Linting pull requests
+- Building documentation
+
+These four _workflows_ trigger automatically when certain _events_ happen.
+
+### Pull Requests
+
+When a PR is made:
+
+- Wheels and an sdist are built using the code in the PR branch. Multiple wheels
+  are built for a [variety of architectures and python
+  versions](https://github.com/tensorflow/tfx/blob/master/.github/workflows/wheels.yml).
+  If the PR causes any of the wheels to fail to build, the failure will be
+  reported in the checks for the PR.
+
+- Tests are run via [`pytest`](https://github.com/tensorflow/tfx/blob/master/.github/workflows/ci-test.yml). If a test fails, the workflow failure will be
+  reported in the checks for the PR.
+
+- Lint checks are run on the changed files. This workflow makes use of the
+  [`.pre-commit-config.yaml`](https://github.com/tensorflow/tfx/blob/master/.pre-commit-config.yaml), and if any lint violations are found the workflow
+  reports a failure on the list of checks for the PR.
+
+If the author of the PR makes a new commit to the PR branch, these checks are
+run again on the new commit.
+
+### Releases
+
+When a release is made on GitHub the workflow that builds wheels runs, just as
+it does for pull requests, but with one difference: it automatically uploads the
+wheels and sdist that are built in the workflow to the Python Package Index
+(PyPI) using [trusted
+publishing](https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/#configuring-trusted-publishing)
+without any additional action required on the part of the release captain. After
+the workflow finishes, users are able to use `pip install tfx` to install the
+newly published version.
+
+### Commits to `master`
+
+When a new commit is made to the `master`, the documentation is built and
+automatically uploaded to github pages.
+
+If you want to see the changes to the documentation when rendered, run `mkdocs
+serve` to build the documentation and serve it locally. Alternatively, if you
+merge your own changes to your own fork's `master` branch, this workflow will
+serve the documentation at `https://<your-github-username>.github.io/tfx`. This
+provides a convenient way for developers to check deployments before they merge
+a PR to the upstream `tfx` repository.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -12,3 +12,10 @@ include tfx/proto/*.proto
 # TODO(b/172611374): Consider adding all testdata in the wheel to make test
 # fixture more portable.
 recursive-include tfx/orchestration/kubeflow/v2/testdata *
+
+recursive-include tfx/components/testdata *
+recursive-include tfx/orchestration/kubeflow/v2/testdata *
+
+include tfx/examples/imdb/data/*
+include tfx/orchestration/beam/testdata/*
+include tfx/orchestration/kubeflow/v2/container/testdata/*
diff --git a/RELEASE.md b/RELEASE.md
@@ -9,6 +9,7 @@
    most likely you discovered a bug and should not use an f-string in the first
    place. If it is truly your intention to print the placeholder (not its
    resolved value) for debugging purposes, use `repr()` or `!r` instead.
+* Drop supports for the Estimator API.
 
 ### For Pipeline Authors
 
@@ -224,7 +225,7 @@
 
 ## Bug Fixes and Other Changes
 
-*  Support to task type "workerpool1" of CLUSTER_SPEC in Vertex AI training's 
+*  Support to task type "workerpool1" of CLUSTER_SPEC in Vertex AI training's
    service according to the changes of task type in Tuner component.
 *  Propagates unexpected import failures in the public v1 module.
 
@@ -2887,4 +2888,4 @@ the 1.1.x release for TFX library.
 
 ### For component authors
 
-*   N/A
+*   N/A
diff --git a/build/BUILD b/build/BUILD
@@ -24,8 +24,6 @@ sh_binary(
         "//tfx/examples/custom_components/presto_example_gen/proto:presto_config_pb2.py",
         "//tfx/extensions/experimental/kfp_compatibility/proto:kfp_component_spec_pb2.py",
         "//tfx/extensions/google_cloud_big_query/experimental/elwc_example_gen/proto:elwc_config_pb2.py",
-        "//tfx/orchestration/experimental/core:component_generated_alert_pb2.py",
-        "//tfx/orchestration/kubeflow/proto:kubeflow_pb2.py",
         "//tfx/proto:bulk_inferrer_pb2.py",
         "//tfx/proto:distribution_validator_pb2.py",
         "//tfx/proto:evaluator_pb2.py",

diff --git a/docs/guide/evaluator.md b/docs/guide/evaluator.md
@@ -66,9 +66,7 @@ import tensorflow_model_analysis as tfma
 eval_config = tfma.EvalConfig(
     model_specs=[
         # This assumes a serving model with signature 'serving_default'. If
-        # using estimator based EvalSavedModel, add signature_name='eval' and
-        # remove the label_key. Note, if using a TFLite model, then you must set
-        # model_type='tf_lite'.
+        # using a TFLite model, then you must set model_type='tf_lite'.
         tfma.ModelSpec(label_key='<label_key>')
     ],
     metrics_specs=[

diff --git a/docs/guide/fairness_indicators.md b/docs/guide/fairness_indicators.md
@@ -43,16 +43,6 @@ an evaluation set that does, or considering proxy features within your feature
 set that may highlight outcome disparities. For additional guidance, see
 [here](https://tensorflow.org/responsible_ai/fairness_indicators/guide/guidance).
 
-### Model
-
-You can use the Tensorflow Estimator class to build your model. Support for
-Keras models is coming soon to TFMA. If you would like to run TFMA on a Keras
-model, please see the “Model-Agnostic TFMA” section below.
-
-After your Estimator is trained, you will need to export a saved model for
-evaluation purposes. To learn more, see the
-[TFMA guide](https://www.tensorflow.org/tfx/model_analysis/get_started).
-
 ### Configuring Slices
 
 Next, define the slices you would like to evaluate on:

diff --git a/docs/guide/index.md b/docs/guide/index.md
@@ -438,23 +438,6 @@ using the exact same code during both training and inference.  Using the
 modeling code, including the SavedModel from the Transform component, you can
 consume your training and evaluation data and train your model.
 
-When working with Estimator based models, the last section of your modeling
-code should save your model as both a SavedModel and an EvalSavedModel.  Saving
-as an EvalSavedModel ensures the metrics used at training time are also
-available during evaluation (note that this is not required for keras based
-models).  Saving an EvalSavedModel requires that you import the
-[TensorFlow Model Analysis (TFMA)](tfma.md) library in your Trainer component.
-
-```python
-import tensorflow_model_analysis as tfma
-...
-
-tfma.export.export_eval_savedmodel(
-        estimator=estimator,
-        export_dir_base=eval_model_dir,
-        eval_input_receiver_fn=receiver_fn)
-```
-
 An optional [Tuner](tuner.md) component can be added before Trainer to tune the
 hyperparameters (e.g., number of layers) for the model. With the given model and
 hyperparameters' search space, tuning algorithm will find the best

diff --git a/docs/guide/keras.md b/docs/guide/keras.md
@@ -38,54 +38,10 @@ they become available in TF 2.x, you can follow the
 
 ## Estimator
 
-The Estimator API has been retained in TensorFlow 2.x, but is not the focus of
-new features and development. Code written in TensorFlow 1.x or 2.x using
-Estimators will continue to work as expected in TFX.
+The Estimator API has been fully dropped since TensorFlow 2.16, we decided to
+discontinue the support for it.
 
-Here is an end-to-end TFX example using pure Estimator:
-[Taxi example (Estimator)](https://github.com/tensorflow/tfx/blob/r0.21/tfx/examples/chicago_taxi_pipeline/taxi_utils.py)
-
-## Keras with `model_to_estimator`
-
-Keras models can be wrapped with the `tf.keras.estimator.model_to_estimator`
-function, which allows them to work as if they were Estimators. To use this:
-
-1.  Build a Keras model.
-2.  Pass the compiled model into `model_to_estimator`.
-3.  Use the result of `model_to_estimator` in Trainer, the way you would
-    typically use an Estimator.
-
-```py
-# Build a Keras model.
-def _keras_model_builder():
-  """Creates a Keras model."""
-  ...
-
-  model = tf.keras.Model(inputs=inputs, outputs=output)
-  model.compile()
-
-  return model
-
-
-# Write a typical trainer function
-def trainer_fn(trainer_fn_args, schema):
-  """Build the estimator, using model_to_estimator."""
-  ...
-
-  # Model to estimator
-  estimator = tf.keras.estimator.model_to_estimator(
-      keras_model=_keras_model_builder(), config=run_config)
-
-  return {
-      'estimator': estimator,
-      ...
-  }
-```
-
-Other than the user module file of Trainer, the rest of the pipeline remains
-unchanged.
-
-## Native Keras (i.e. Keras without `model_to_estimator`)
+## Native Keras (i.e. Keras without Estimator)
 
 !!! Note
     Full support for all features in Keras is in progress, in most cases,
@@ -101,7 +57,7 @@ Here are several examples with native Keras:
     'Hello world' end-to-end example.
 *   [MNIST](https://github.com/tensorflow/tfx/blob/master/tfx/examples/mnist/mnist_pipeline_native_keras.py)
     ([module file](https://github.com/tensorflow/tfx/blob/master/tfx/examples/mnist/mnist_utils_native_keras.py)):
-    Image and TFLite end-to-end example.
+    Image end-to-end example.
 *   [Taxi](https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_native_keras.py)
     ([module file](https://github.com/tensorflow/tfx/blob/master/tfx/examples/chicago_taxi_pipeline/taxi_utils_native_keras.py)):
     end-to-end example with advanced Transform usage.
@@ -132,11 +88,6 @@ will be discussed in the following Trainer and Evaluator sections.
 
 #### Trainer
 
-To configure native Keras, the `GenericExecutor` needs to be set for Trainer
-component to replace the default Estimator based executor. For details, please
-check
-[here](trainer.md#configuring-the-trainer-component).
-
 ##### Keras Module file with Transform
 
 The training module file must contains a `run_fn` which will be called by the
@@ -296,9 +247,4 @@ validate the current model compared with previous models. With this change, the
 Pusher component now consumes a blessing result from Evaluator instead of
 ModelValidator.
 
-The new Evaluator supports Keras models as well as Estimator models. The
-`_eval_input_receiver_fn` and eval saved model which were required previously
-will no longer be needed with Keras, since Evaluator is now based on the same
-`SavedModel` that is used for serving.
-
 [See Evaluator for more information](evaluator.md).
diff --git a/docs/guide/modelval.md b/docs/guide/modelval.md
@@ -33,9 +33,7 @@ import tensorflow_model_analysis as tfma
 
 eval_config = tfma.EvalConfig(
     model_specs=[
-        # This assumes a serving model with signature 'serving_default'. If
-        # using estimator based EvalSavedModel, add signature_name: 'eval' and
-        # remove the label_key.
+        # This assumes a serving model with signature 'serving_default'.
         tfma.ModelSpec(label_key='<label_key>')
     ],
     metrics_specs=[

diff --git a/docs/guide/train.md b/docs/guide/train.md
@@ -22,59 +22,3 @@ a [Transform](transform.md) component, and the layers of the Transform model sho
 be included with your model so that when you export your SavedModel and
 EvalSavedModel they will include the transformations that were created by the
 [Transform](transform.md) component.
-
-A typical TensorFlow model design for TFX looks like this:
-
-```python
-def _build_estimator(tf_transform_dir,
-                     config,
-                     hidden_units=None,
-                     warm_start_from=None):
-  """Build an estimator for predicting the tipping behavior of taxi riders.
-
-  Args:
-    tf_transform_dir: directory in which the tf-transform model was written
-      during the preprocessing step.
-    config: tf.contrib.learn.RunConfig defining the runtime environment for the
-      estimator (including model_dir).
-    hidden_units: [int], the layer sizes of the DNN (input layer first)
-    warm_start_from: Optional directory to warm start from.
-
-  Returns:
-    Resulting DNNLinearCombinedClassifier.
-  """
-  metadata_dir = os.path.join(tf_transform_dir,
-                              transform_fn_io.TRANSFORMED_METADATA_DIR)
-  transformed_metadata = metadata_io.read_metadata(metadata_dir)
-  transformed_feature_spec = transformed_metadata.schema.as_feature_spec()
-
-  transformed_feature_spec.pop(_transformed_name(_LABEL_KEY))
-
-  real_valued_columns = [
-      tf.feature_column.numeric_column(key, shape=())
-      for key in _transformed_names(_DENSE_FLOAT_FEATURE_KEYS)
-  ]
-  categorical_columns = [
-      tf.feature_column.categorical_column_with_identity(
-          key, num_buckets=_VOCAB_SIZE + _OOV_SIZE, default_value=0)
-      for key in _transformed_names(_VOCAB_FEATURE_KEYS)
-  ]
-  categorical_columns += [
-      tf.feature_column.categorical_column_with_identity(
-          key, num_buckets=_FEATURE_BUCKET_COUNT, default_value=0)
-      for key in _transformed_names(_BUCKET_FEATURE_KEYS)
-  ]
-  categorical_columns += [
-      tf.feature_column.categorical_column_with_identity(
-          key, num_buckets=num_buckets, default_value=0)
-      for key, num_buckets in zip(
-          _transformed_names(_CATEGORICAL_FEATURE_KEYS),  #
-          _MAX_CATEGORICAL_FEATURE_VALUES)
-  ]
-  return tf.estimator.DNNLinearCombinedClassifier(
-      config=config,
-      linear_feature_columns=categorical_columns,
-      dnn_feature_columns=real_valued_columns,
-      dnn_hidden_units=hidden_units or [100, 70, 50, 25],
-      warm_start_from=warm_start_from)
-```