WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Dec 29, 2025

Describe the issue this Pull Request addresses

Updates the json and proto conversion utilities to use HoodieSchema instead of the Avro Schema when possible

Summary and Changelog

  • Updates the converters for MercifulJsonConverter to use HoodieSchema instead of Avro Schema
  • Updates the ProtoConversionUtil to use HoodieSchema for conversions and directly generates the HoodieSchema instead of converting from Avro
  • Removes unused code

Impact

Removes reliance on Avro's schema class and replaces it with Hoodie's own schema system

Risk Level

Low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 29, 2025
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review December 30, 2025 03:09
@the-other-tim-brown the-other-tim-brown force-pushed the converters-schema-migration branch from 229119a to c70a3e1 Compare December 30, 2025 18:33
Copy link
Contributor

@balaji-varadarajan-ai balaji-varadarajan-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM

bigDecimal = new BigDecimal(obj.toString(), new MathContext(logicalType.getPrecision(), RoundingMode.UNNECESSARY)).setScale(logicalType.getScale(), RoundingMode.UNNECESSARY);
bigDecimal = new BigDecimal(obj.toString(), new MathContext(schema.getPrecision(), RoundingMode.UNNECESSARY)).setScale(schema.getScale(), RoundingMode.UNNECESSARY);
}
} catch (java.lang.NumberFormatException | ArithmeticException ignored) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Possible to import this?

protected static boolean isValidDecimalTypeConfig(Schema schema) {
LogicalTypes.Decimal decimalType = (LogicalTypes.Decimal) schema.getLogicalType();
protected static boolean isValidDecimalTypeConfig(HoodieSchema schema) {
LogicalTypes.Decimal decimalType = (LogicalTypes.Decimal) schema.toAvroSchema().getLogicalType();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still under the Avro package. I think converting everything to avro make sense.

During my test, IIRC, via the HoodieSchema#createDecimal helper, but creating a decimal that is backed by fix with an improper fixed size will fail silently.

A HoodieSchema will still be created, but it will not be an instance of HoodieSchema.Decimal. This might be a validation that we need to add in that helper method. Just something orthogonal that i happened to think of when reading this part of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following this comment, is there anything needed in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope nothing needed. Just adding a comment here for future reference.

fieldTypeProcessors.put(HoodieSchemaType.ENUM, generateEnumTypeHandler());
fieldTypeProcessors.put(HoodieSchemaType.MAP, generateMapTypeHandler());
fieldTypeProcessors.put(HoodieSchemaType.BYTES, generateBytesTypeHandler());
fieldTypeProcessors.put(HoodieSchemaType.FIXED, generateFixedTypeHandler());
Copy link
Member

@voonhous voonhous Jan 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~~Is it possible to shift some of the KV pairs in getLogicalFieldTypeProcessors to this scope?

e.g. DECIMAL, TIME, DATE, etc.~~

Edit: On second thoughts, ignore my comments above. We are in the Avro scope. So, concepts should be Avro-first instead of HoodieSchema-first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ok as it is then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, it's okay.

// if incoming message does not contain the field, fieldDescriptor will be null
// if the field schema is a union, it is nullable
if (fieldSchema.getType() == Schema.Type.UNION && (fieldDescriptor == null || (!fieldDescriptor.isRepeated() && !messageValue.hasField(fieldDescriptor)))) {
if (fieldSchema.getType() == HoodieSchemaType.UNION && (fieldDescriptor == null || (!fieldDescriptor.isRepeated() && !messageValue.hasField(fieldDescriptor)))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Check if fieldSchema#isNullable?

@the-other-tim-brown the-other-tim-brown force-pushed the converters-schema-migration branch from 5a69440 to 5161f8f Compare January 1, 2026 21:38
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 1, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Member

@voonhous voonhous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@voonhous
Copy link
Member

voonhous commented Jan 2, 2026

CI is green, merging this in.

@voonhous voonhous merged commit c177e2b into apache:master Jan 2, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants