WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

[Java] create @ForyField annotation to provide extra meta for perf/space optimization #3000

@chaokunyang

Description

@chaokunyang

Feature Request

Create a @ForyField annotation to provide extra metadata for performance and space optimization during java/xlang serialization.

Is your feature request related to a problem? Please describe

Currently, Fory's java/xlang serialization treats all object fields uniformly:

  1. Null checks are always performed - Even for fields that are never null, Fory writes a null/ref flag (1 byte per field)
  2. Reference tracking is always applied (when enabled globally) - Even for fields that won't be shared/cyclic, objects are added to IdentityMap with hash lookup cost
  3. Field names use meta string encoding - In schema evolution mode, field names are encoded using meta string compression, but for fields with long names, this still takes space

These defaults ensure correctness but introduce unnecessary overhead when the developer has more specific knowledge about their data model.

Describe the solution you'd like

Add a @ForyField annotation that allows developers to provide field-level hints to optimize serialization:

public @interface ForyField {
    /**
     * Field tag ID for schema evolution mode (REQUIRED).
     * - When >= 0: Uses this numeric ID instead of field name string for compact encoding
     * - When -1: Explicitly opt-out of tag ID, use field name with meta string encoding
     * Must be unique within the class (except -1) and stable across versions.
     */
    int id();

    /**
     * Whether this field can be null.
     * When set to false (default), Fory skips writing the null flag (saves 1 byte).
     * When set to true, Fory writes null flag for nullable fields.
     * Default: false (field is non-nullable, aligned with xlang protocol defaults)
     */
    boolean nullable() default false;

    /**
     * Whether to track references for this field.
     * When set to false (default):
     * - Avoids adding the object to IdentityMap (saves hash map overhead)
     * - Skips writing ref tracking flag (saves 1 byte when combined with nullable=false)
     * When set to true, enables reference tracking for shared/circular references.
     * Default: false (no reference tracking, aligned with xlang protocol defaults)
     */
    boolean ref() default false;
}

Note: The defaults (nullable = false, ref = false) align with the xlang protocol specification which states: "In xlang mode, for cross-language compatibility: All fields are treated as not-null by default; Reference tracking is disabled by default."

Design Decision: Required id Field

We chose to make id a required parameter rather than optional. Here's the design rationale:

Why id is Required

  1. Explicit control principle: If a developer uses @ForyField, they are opting into explicit field-level control. Requiring an ID ensures they take full ownership of the field's serialization behavior, similar to how protobuf requires field numbers.

  2. Proven pattern: Protocol Buffers has demonstrated that required field numbers work well for schema evolution. Field numbers provide stable identification that survives renames and reordering.

  3. Prevents subtle bugs: Mixing tagged fields (with ID) and untagged fields (using field name) in schema evolution mode could lead to:

    • Inconsistent encoding within the same class
    • Confusion about which fields are stable vs. name-dependent
    • Harder debugging when schema evolution issues arise
  4. Simpler mental model: "Use @ForyField = assign an ID" is easier to understand than "Use @ForyField, optionally assign an ID depending on whether you care about schema evolution."

  5. Space optimization guarantee: With required IDs, every annotated field benefits from compact tag ID encoding in schema evolution mode, rather than falling back to meta string encoding.

Opt-out with id = -1

For users who want to use @ForyField for nullable/ref settings but prefer field name encoding:

  • Set id = -1 to explicitly opt-out of tag ID encoding
  • Field will use meta string encoding for the field name
  • This is an explicit choice, not an accidental omission
// Explicit opt-out: use field name encoding, but still get nullable/ref optimization
@ForyField(id = -1, nullable = true)
String fieldUsingNameEncoding;

Alternatives Considered

Option Description Why Not Chosen
Optional id with default -1 Implicit fallback to field name Accidental omission looks same as intentional opt-out
Context-dependent Require id only in schema evolution mode Complex rules, behavior depends on Fory config
Separate annotation @ForyTag(id) for IDs only More annotations to manage, less cohesive

Usage Example

class Foo {
    // Field f1: non-nullable (default), no ref tracking (default)
    // Tag ID 0 provides compact encoding in schema evolution mode
    @ForyField(id = 0)
    String f1;

    // Field f2: non-nullable (default), no ref tracking (default)
    @ForyField(id = 1)
    Bar f2;

    // Field f3: nullable field that may contain null values
    @ForyField(id = 2, nullable = true)
    String f3;

    // Field f4: shared reference that needs tracking (e.g., for circular refs)
    @ForyField(id = 3, ref = true)
    Node parent;

    // Field with long name: tag ID provides significant space savings
    // Without @ForyField: ~12 bytes for meta string
    // With @ForyField(id = 4): 1 byte for tag ID
    @ForyField(id = 4)
    String veryLongFieldNameThatWouldTakeManyBytes;

    // Explicit opt-out: use field name encoding but get nullable optimization
    @ForyField(id = -1, nullable = true)
    String optionalField;
}

Optimization Details

1. nullable = false (Default) Optimization

According to the Reference Meta section of the protocol spec:

Flag Byte Value Description
NULL FLAG -3 (0xFD) Object is null
REF FLAG -2 (0xFE) Object was already serialized
NOT_NULL VALUE FLAG -1 (0xFF) Non-null, ref tracking disabled
REF VALUE FLAG 0 (0x00) Referencable, first occurrence

When nullable = false (default):

  • Skip writing the null flag entirely (1 byte saved per field)
  • Directly serialize the field value

2. ref = false (Default) Optimization

When ref = false (default):

  • Skip IdentityMap lookup/insertion: Avoids the O(1) hash map operation overhead per field
  • Skip ref flag when combined with nullable = false: No ref/null header byte needed
  • Useful for:
    • Value types that are never shared (e.g., immutable DTOs)
    • Primitive wrapper fields
    • Fields known to never be part of circular references

3. id (Tag ID) Optimization

According to the Field Header section:

Field Header format:
2 bits field name encoding + 4 bits size + nullability flag + ref tracking flag

When field name encoding is TAG_ID (binary 11):

  • Field name is written as an unsigned varint tag ID instead of meta string
  • For a field named veryLongFieldName (18 chars), meta string encoding would take ~12 bytes
  • With tag ID (e.g., id = 5), it takes only 1 byte (varint for small numbers)

Space savings example:

Field Name Meta String (approx) Tag ID
f1 ~2 bytes 1 byte
userName ~6 bytes 1 byte
transactionId ~9 bytes 1 byte
veryLongFieldName ~12 bytes 1 byte

Protocol Reference

The field header format in schema evolution mode (spec link):

| 2 bits field name encoding | 4 bits size | nullability flag | ref tracking flag |
  • Field name encoding: UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID
  • When TAG_ID encoding is used (encoding bits = 11), the 4-bit size field stores the tag ID for small values, or a varint follows for larger IDs
  • Nullability flag: When set to 1, this field can be null
  • Ref tracking flag: When set to 1, ref tracking is enabled for this field

Implementation Notes

  1. Annotation Processing:

    • Parse @ForyField annotations during class registration
    • Store field metadata in ClassInfo / field descriptors
    • Use metadata during serializer code generation
  2. Serializer Code Generation:

    • When nullable = false (default): Generate code that skips null check and flag writing
    • When ref = false (default): Generate code that bypasses RefResolver
    • When id >= 0: Use tag ID for field name encoding
    • When id = -1: Use meta string encoding for field name
  3. Cross-language Support:

    • Java: @ForyField(id = N) annotation
    • Python: field(id=N) descriptor or dataclass field metadata
    • Go: struct tags (e.g., fory:"id=0" or fory:"id=-1,nullable")
    • Rust: #[fory(id = 0)] or #[fory(id = -1, nullable)] attribute
    • C++: macro-based or template-based field metadata
  4. Validation:

    • Raise error if nullable = false but field value is null at runtime (fail-fast to catch bugs early)
    • Raise error if tag IDs (>= 0) are not unique within a class
    • Raise error if id < -1 (only -1 and non-negative values allowed)
    • Tag IDs should be stable across versions (like protobuf field numbers)

Performance Impact

For a struct with 10 fields using default settings (nullable = false, ref = false):

  • Space savings: 10 bytes (null flags) + 10 bytes (ref flags) = 20 bytes per object
  • CPU savings: 10 fewer hash map operations per object serialization

For schema evolution mode with fields averaging 12-char names using tag IDs:

  • Space savings: ~10 bytes per field in type meta

Describe alternatives you've considered

  1. Class-level annotation only (@ForyObject): Already exists but doesn't provide field-level granularity
  2. Custom serializers: Too verbose for simple optimizations
  3. Global configuration: Doesn't allow field-specific tuning
  4. Optional id: See "Design Decision" section above

Additional context

Protocol spec: https://fory.apache.org/docs/specification/fory_xlang_serialization_spec

From the spec regarding xlang defaults:

"In xlang mode, for cross-language compatibility: All fields are treated as not-null by default; Reference tracking is disabled by default."

Field Header format reference:
Field Header

Related existing annotation @ForyObject:

@ForyObject(fieldsNullable = false, trackingRef = false)
class Foo {
  // class-level settings apply to all fields
}

The new @ForyField annotation would complement @ForyObject by providing field-level overrides, with defaults that match the xlang protocol's optimized behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions