WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Dec 29, 2025

Describe the issue this Pull Request addresses

Part 1 of #14280
This aims to remove direct usage of the avro schema in the flink client with the exception of the Flink Record objects which will be covered in #17689

Summary and Changelog

  • Removes the AvroSchemaConverter and updates all calls to go through HoodieSchemaConverter
  • Updates RowDataToAvroConverters converter interface to use HoodieSchema
  • Updates RowDataAvroQueryContexts to use HoodieSchema. Class name is updated to RowDataQueryContexts

Impact

Updates the client to use our new schema system to allow us to add new types not available in Avro.

Risk Level

Low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 29, 2025
}

@Test
void testUnionSchemaWithMultipleRecordTypes() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are copied from the TestAvroSchemaConverter

@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Dec 29, 2025
return rowDataQueryContext.getFieldQueryContext(column).getValAsJava(data, allowsNull);
}

private Object getColumnValue(Schema recordSchema, String column, Properties props) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just change the param to take in HoodieSchema so that then in line 204 we dont have to do fromAvroSchema, or does this result in large changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a private method so it is fine but as mentioned in the description, the record classes are handled in a separate ticket

return getColumnValueAsJava(recordSchema, column, props, true);
}

private Object getColumnValueAsJava(Schema recordSchema, String column, Properties props, boolean allowsNull) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

@Override
public Option<HoodieAvroIndexedRecord> toIndexedRecord(Schema recordSchema, Properties props) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, the Record interface will not be updated as part of this.

@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review December 29, 2025 19:16
@the-other-tim-brown
Copy link
Contributor Author

@danny0405 can you give this a review?

@the-other-tim-brown the-other-tim-brown force-pushed the flink-client-schema-migration branch from 02ba0c6 to 6bc7f5c Compare December 31, 2025 19:06
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 1, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

import java.util.Arrays;

import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super Nit:

While there are some parts of our code that are using hamcrest for assertThat, I feel assertj is more widely used. e.g. Trino and flink-cdc chose it over hamcrest.

import static org.assertj.core.api.Assertions.assertThat;

We have both in our code base now, i feel we should clean this up in a chore PR moving forward.

@Override
public Object getColumnValueAsJava(Schema recordSchema, String column, Properties props) {
return getColumnValueAsJava(recordSchema, column, props, true);
return getColumnValueAsJava(HoodieSchema.fromAvroSchema(recordSchema), column, props, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have plan to replace the param as a HoodieSchema too: getColumnValueAsJava(HoodieSchema recordSchema to avoid the per-record schema creation overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is here: #17772

boolean utcTimezone = Boolean.parseBoolean(props.getProperty(
HoodieStorageConfig.WRITE_UTC_TIMEZONE.key(), HoodieStorageConfig.WRITE_UTC_TIMEZONE.defaultValue().toString()));
RowDataQueryContext rowDataQueryContext = RowDataAvroQueryContexts.fromAvroSchema(recordSchema, utcTimezone);
RowDataQueryContext rowDataQueryContext = RowDataQueryContexts.fromSchema(recordSchema, utcTimezone);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed that the cache should still work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants