WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

[Bug] [Jdbc] JdbcSource bounded job never finishes after TaskManager failover #10138

@Adamyuanyuan

Description

@Adamyuanyuan

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

The batch synchronization task from MySQL to XXX automatically retries after the first TM disconnection, but the second run remains in the RUNNING state and never finishes naturally; it can only be stopped manually.

JdbcSource bounded job never finishes after TaskManager failover (NoMoreSplitsEvent missing for recovered readers)

SeaTunnel Version

2.3.12

SeaTunnel Config

{
    "env" : {
        "execution.parallelism" : 1,
        "job.mode" : "BATCH"
    },
    "source" : [
        {
            "url" : "jdbc:mysql://10.xxxx:6301/xxxx?useUnicode=true&characterEncoding=UTF-8&useSSL=false",
            "driver" : "com.mysql.cj.jdbc.Driver",
            "user" : "readonly",
            "password" : "******",
            "query" : "select xxxx FROM xxxWHERE 1=1",
            "result_table_name" : "result_table",
            "plugin_name" : "Jdbc"
        }
    ],
    "sink" : [
        {
            "table_name" : "xxxx.xxxx",
            "metastore_uri" : "thrift://xxxx:9083,thrift:/xxxx:9083",
            "hdfs_site_path" : "xxxx/hdfs-site.xml",
            "hive_site_path" : "xxxx/hive-site.xml",
            "krb5_path" : "xxxx/krb5.conf",
            "source_table_name" : "source_table",
            "compress_codec" : "SNAPPY",
            "plugin_name" : "Hive"
        }
    ],
    "transform" : [
        {
            "source_table_name" : "result_table",
            "result_table_name" : "source_table",
            "query" : "select * from result_table",
            "plugin_name" : "Sql"
        }
    ]
}

Running Command

use flink API

Error Exception

1. The job started normally on the TaskManager container `container_e13_..._000002` for the first time:
   - `JdbcSourceSplitEnumerator` split the table into 1 split:
     - `Splitting table xxx.xxxx.`
     - `Split table xxx.xxxx into 1 splits.`
   - Then sent `NoMoreSplitsEvent`:
     - `No more splits to assign. Sending NoMoreSplitsEvent to reader [0].`
2. After running for about 7 minutes, the heartbeat of this TaskManager timed out:
   - `Heartbeat of TaskManager ... timed out.`
   - The corresponding operator changed from `RUNNING` to `FAILED`, and the job changed from `RUNNING` to `RESTARTING`.
3. Flink automatically recovered the job on the new container `container_e13_..._000003`:
   - `Recovering subtask 0 to checkpoint -1 for source Source: Jdbc-Source to checkpoint.`
   - `Adding splits back from subtask: 0, splits count: 1`
   - `Add back splits 1 to JdbcSourceSplitEnumerator.`
   - The new reader registered and obtained the split:
     - `Register reader 0 to JdbcSourceSplitEnumerator.`
     - `Assigning 1 splits to subtask: 0`
4. From the second start until it was manually canceled, the following **never appeared** in the logs:
   - `No more splits to assign. Sending NoMoreSplitsEvent ...`
   - There were no exception stacks for any operators either.
5. It was finally ended by an external cancel:
   - Job status: `RUNNING` → `CANCELLING` → `CANCELED`.
   - Final YARN status: `KILLED`, with the diagnosis being `null`.
   - SeaTunnel reported an error: `Reason:Flink job executed failed` (because the Flink Job was CANCELED).

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions