TFX Tuner on Vertex AI fails with DEADLINE\_EXCEEDED, no custom job launched


**System information**

* Have I specified the code to reproduce the issue: **Yes (pipeline with TFX Tuner on Vertex AI)**
* Environment in which the code is executed: **Google Cloud Vertex AI Pipelines**
* TensorFlow version: **2.16**
* TFX Version: **1.16.0**
* Python version: **3.10**

---

**Describe the current behavior**

When running a pipeline with the **TFX Tuner component** on **Vertex AI**, the tuning job gets stuck.
No Vertex AI custom job or study for hyperparameter tuning is launched.

It gets stuck right after returning from:

```python
return TunerFnResult(tuner=tuner, fit_kwargs=fit_kwargs)
```

I have tested with **AI Vizier**, **Cloud Tuner**, **RandomSearch**, and all options listed in the [[TFX Tuner documentation](https://www.tensorflow.org/tfx/guide/tuner)](https://www.tensorflow.org/tfx/guide/tuner).
The result is always the same: the job eventually fails with a `grpc._channel._InactiveRpcError` caused by a `Deadline Exceeded` error.

As a workaround, I had to force Keras tuning to run **locally** (so results are written to `/tmp`) and then manually copy results to **GCS** afterwards.

---

**Describe the expected behavior**

The Tuner component should successfully launch a Vertex AI hyperparameter tuning job (CustomJob/Study) and report the results back to the pipeline, instead of failing with a timeout.

---

**Standalone code to reproduce the issue**

A minimal pipeline containing a `Tuner` component configured to run on Vertex AI.
(tuner\_fn similar to the official TFX [[Tuner example](https://www.tensorflow.org/tfx/guide/tuner)](https://www.tensorflow.org/tfx/guide/tuner)).

```python
tuner = Tuner(
    module_file=module_file,
    examples=example_gen.outputs['examples'],
    train_args=trainer_pb2.TrainArgs(num_steps=1000),
    eval_args=trainer_pb2.EvalArgs(num_steps=200),
)
```

The component runs fine locally, but fails when orchestrated on **Vertex AI Pipelines**.

---

**Other info / logs**

Full traceback from Vertex logs:

```
Traceback (most recent call last):
  ...
  File "/opt/conda/lib/python3.10/site-packages/keras_tuner/src/distribute/oracle_client.py", line 81, in create_trial
    response = self.stub.CreateTrial(
  ...
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
  status = StatusCode.DEADLINE_EXCEEDED
  details = "Deadline Exceeded"
  debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"Deadline Exceeded", grpc_status:4, created_time:"2025-09-12T08:10:49.033761848+00:00"}"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TFX Tuner on Vertex AI fails with DEADLINE\_EXCEEDED, no custom job launched #7775

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TFX Tuner on Vertex AI fails with DEADLINE\_EXCEEDED, no custom job launched #7775

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions