-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Labels
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
The batch synchronization task from MySQL to XXX automatically retries after the first TM disconnection, but the second run remains in the RUNNING state and never finishes naturally; it can only be stopped manually.
JdbcSource bounded job never finishes after TaskManager failover (NoMoreSplitsEvent missing for recovered readers)
SeaTunnel Version
2.3.12
SeaTunnel Config
{
"env" : {
"execution.parallelism" : 1,
"job.mode" : "BATCH"
},
"source" : [
{
"url" : "jdbc:mysql://10.xxxx:6301/xxxx?useUnicode=true&characterEncoding=UTF-8&useSSL=false",
"driver" : "com.mysql.cj.jdbc.Driver",
"user" : "readonly",
"password" : "******",
"query" : "select xxxx FROM xxxWHERE 1=1",
"result_table_name" : "result_table",
"plugin_name" : "Jdbc"
}
],
"sink" : [
{
"table_name" : "xxxx.xxxx",
"metastore_uri" : "thrift://xxxx:9083,thrift:/xxxx:9083",
"hdfs_site_path" : "xxxx/hdfs-site.xml",
"hive_site_path" : "xxxx/hive-site.xml",
"krb5_path" : "xxxx/krb5.conf",
"source_table_name" : "source_table",
"compress_codec" : "SNAPPY",
"plugin_name" : "Hive"
}
],
"transform" : [
{
"source_table_name" : "result_table",
"result_table_name" : "source_table",
"query" : "select * from result_table",
"plugin_name" : "Sql"
}
]
}
Running Command
use flink APIError Exception
1. The job started normally on the TaskManager container `container_e13_..._000002` for the first time:
- `JdbcSourceSplitEnumerator` split the table into 1 split:
- `Splitting table xxx.xxxx.`
- `Split table xxx.xxxx into 1 splits.`
- Then sent `NoMoreSplitsEvent`:
- `No more splits to assign. Sending NoMoreSplitsEvent to reader [0].`
2. After running for about 7 minutes, the heartbeat of this TaskManager timed out:
- `Heartbeat of TaskManager ... timed out.`
- The corresponding operator changed from `RUNNING` to `FAILED`, and the job changed from `RUNNING` to `RESTARTING`.
3. Flink automatically recovered the job on the new container `container_e13_..._000003`:
- `Recovering subtask 0 to checkpoint -1 for source Source: Jdbc-Source to checkpoint.`
- `Adding splits back from subtask: 0, splits count: 1`
- `Add back splits 1 to JdbcSourceSplitEnumerator.`
- The new reader registered and obtained the split:
- `Register reader 0 to JdbcSourceSplitEnumerator.`
- `Assigning 1 splits to subtask: 0`
4. From the second start until it was manually canceled, the following **never appeared** in the logs:
- `No more splits to assign. Sending NoMoreSplitsEvent ...`
- There were no exception stacks for any operators either.
5. It was finally ended by an external cancel:
- Job status: `RUNNING` → `CANCELLING` → `CANCELED`.
- Final YARN status: `KILLED`, with the diagnosis being `null`.
- SeaTunnel reported an error: `Reason:Flink job executed failed` (because the Flink Job was CANCELED).
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct