WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Commit 9dd60f9

Browse files
committed
Fix ordering fields related documentation for 1.1.0
1 parent d5c17ce commit 9dd60f9

File tree

19 files changed

+324
-337
lines changed

19 files changed

+324
-337
lines changed

hudi-utils/src/main/java/org/apache/hudi/utils/HoodieSparkConfigs.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ public static String description(Object sparkConfigObject) {
6161
".options(clientOpts) // any of the Hudi client opts can be passed in as well\n" +
6262
".option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), \"_row_key\")\n" +
6363
".option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), \"partition\")\n" +
64-
".option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), \"timestamp\")\n" +
64+
".option(HoodieTableConfig.ORDERING_FIELDS(), \"timestamp\")\n" +
6565
".option(HoodieWriteConfig.TABLE_NAME, tableName)\n" +
6666
".mode(SaveMode.Append)\n" +
6767
".save(basePath);\n" +

website/docs/basic_configurations.md

Lines changed: 66 additions & 70 deletions
Large diffs are not rendered by default.

website/docs/configurations.md

Lines changed: 67 additions & 70 deletions
Large diffs are not rendered by default.

website/docs/quick-start-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1240,7 +1240,7 @@ CREATE TABLE hudi_table (
12401240
driver STRING,
12411241
fare DOUBLE,
12421242
city STRING
1243-
) USING HUDI TBLPROPERTIES (preCombineField = 'ts')
1243+
) USING HUDI TBLPROPERTIES (orderingFields = 'ts')
12441244
PARTITIONED BY (city);
12451245
```
12461246
</TabItem

website/docs/record_merger.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -128,11 +128,11 @@ For more details on the implementation, see [RFC 101](https://github.com/apache/
128128

129129
The record merge mode and optional record merge strategy ID and custom merge implementation classes can be specified using the below configs.
130130

131-
| Config Name | Default | Description |
132-
|---------------------------------------------------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
133-
| hoodie.write.record.merge.mode | EVENT_TIME_ORDERING (when ordering field is set)<br />COMMIT_TIME_ORDERING (when ordering field is not set) | Determines the logic of merging different records with the same record key. Valid values: (1) `COMMIT_TIME_ORDERING`: use commit time to merge records, i.e., the record from later commit overwrites the earlier record with the same key. (2) `EVENT_TIME_ORDERING`: use event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of commit time. The event time or preCombine field needs to be specified by the user. This is the default when an ordering field is configured. (3) `CUSTOM`: use custom merging logic specified by the user.<br />`Config Param: RECORD_MERGE_MODE`<br />`Since Version: 1.0.0` |
134-
| hoodie.write.record.merge.strategy.id | N/A (Optional) | ID of record merge strategy. Hudi will pick `HoodieRecordMerger` implementations from `hoodie.write.record.merge.custom.implementation.classes` that have the same merge strategy ID. When using custom merge logic, you need to specify both this config and `hoodie.write.record.merge.custom.implementation.classes`.<br />`Config Param: RECORD_MERGE_STRATEGY_ID`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.strategy` (deprecated) |
135-
| hoodie.write.record.merge.custom.implementation.classes | N/A (Optional) | List of `HoodieRecordMerger` implementations constituting Hudi's merging strategy based on the engine used. Hudi selects the first implementation from this list that matches the following criteria: (1) has the same merge strategy ID as specified in `hoodie.write.record.merge.strategy.id` (if provided), (2) is compatible with the execution engine (e.g., SPARK merger for Spark, FLINK merger for Flink, AVRO for Java/Hive). The order in the list matters - place your preferred implementation first. Engine-specific implementations (SPARK, FLINK) are more efficient as they avoid Avro serialization/deserialization overhead.<br />`Config Param: RECORD_MERGE_IMPL_CLASSES`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.impls` (deprecated) |
131+
| Config Name | Default | Description |
132+
|---------------------------------------------------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
133+
| hoodie.write.record.merge.mode | EVENT_TIME_ORDERING (when ordering field is set)<br />COMMIT_TIME_ORDERING (when ordering field is not set) | Determines the logic of merging different records with the same record key. Valid values: (1) `COMMIT_TIME_ORDERING`: use commit time to merge records, i.e., the record from later commit overwrites the earlier record with the same key. (2) `EVENT_TIME_ORDERING`: use event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of commit time. The event time or ordering fields need to be specified by the user. This is the default when an ordering field is configured. (3) `CUSTOM`: use custom merging logic specified by the user.<br />`Config Param: RECORD_MERGE_MODE`<br />`Since Version: 1.0.0` |
134+
| hoodie.write.record.merge.strategy.id | N/A (Optional) | ID of record merge strategy. Hudi will pick `HoodieRecordMerger` implementations from `hoodie.write.record.merge.custom.implementation.classes` that have the same merge strategy ID. When using custom merge logic, you need to specify both this config and `hoodie.write.record.merge.custom.implementation.classes`.<br />`Config Param: RECORD_MERGE_STRATEGY_ID`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.strategy` (deprecated) |
135+
| hoodie.write.record.merge.custom.implementation.classes | N/A (Optional) | List of `HoodieRecordMerger` implementations constituting Hudi's merging strategy based on the engine used. Hudi selects the first implementation from this list that matches the following criteria: (1) has the same merge strategy ID as specified in `hoodie.write.record.merge.strategy.id` (if provided), (2) is compatible with the execution engine (e.g., SPARK merger for Spark, FLINK merger for Flink, AVRO for Java/Hive). The order in the list matters - place your preferred implementation first. Engine-specific implementations (SPARK, FLINK) are more efficient as they avoid Avro serialization/deserialization overhead.<br />`Config Param: RECORD_MERGE_IMPL_CLASSES`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.impls` (deprecated) |
136136

137137
## Record Payloads (deprecated)
138138

website/docs/sql_ddl.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ should be specified as `PARTITIONED BY (dt, hh)`.
7777

7878
As discussed [here](quick-start-guide.md#keys), tables track each record in the table using a record key. Hudi auto-generated a highly compressed
7979
key for each new record in the examples so far. If you want to use an existing field as the key, you can set the `primaryKey` option.
80-
Typically, this is also accompanied by configuring ordering fields (via `preCombineField` option) to deal with out-of-order data and potential
80+
Typically, this is also accompanied by configuring ordering fields (via `orderingFields` option) to deal with out-of-order data and potential
8181
duplicate records with the same key in the incoming writes.
8282

8383
:::note
@@ -86,7 +86,7 @@ this materializes a composite key of the two fields, which can be useful for exp
8686
:::
8787

8888
Here is an example of creating a table using both options. Typically, a field that denotes the time of the event or
89-
fact, e.g., order creation time, event generation time etc., is used as the ordering field (via `preCombineField`). Hudi resolves multiple versions
89+
fact, e.g., order creation time, event generation time etc., is used as the ordering field (via `orderingFields`). Hudi resolves multiple versions
9090
of the same record by ordering based on this field when queries are run on the table.
9191

9292
```sql
@@ -99,7 +99,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_keyed (
9999
TBLPROPERTIES (
100100
type = 'cow',
101101
primaryKey = 'id',
102-
preCombineField = 'ts'
102+
orderingFields = 'ts'
103103
);
104104
```
105105

@@ -118,13 +118,13 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode (
118118
TBLPROPERTIES (
119119
type = 'mor',
120120
primaryKey = 'id',
121-
precombineField = 'ts',
121+
orderingFields = 'ts',
122122
recordMergeMode = 'EVENT_TIME_ORDERING'
123123
)
124124
LOCATION 'file:///tmp/hudi_table_merge_mode/';
125125
```
126126

127-
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `precombineField` ordering field) overwrites the record with the
127+
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `orderingFields`) overwrites the record with the
128128
smaller event time on the same key, regardless of transaction's commit time. Users can set `CUSTOM` mode to provide their own
129129
merge logic. With `CUSTOM` merge mode, you can provide a custom class that implements the merge logic. The interfaces
130130
to implement is explained in detail [here](record_merger.md#custom).
@@ -139,7 +139,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode_custom (
139139
TBLPROPERTIES (
140140
type = 'mor',
141141
primaryKey = 'id',
142-
precombineField = 'ts',
142+
orderingFields = 'ts',
143143
recordMergeMode = 'CUSTOM',
144144
'hoodie.record.merge.strategy.id' = '<unique-uuid>'
145145
)
@@ -177,7 +177,7 @@ CREATE TABLE hudi_table_ctas
177177
USING hudi
178178
TBLPROPERTIES (
179179
type = 'cow',
180-
preCombineField = 'ts'
180+
orderingFields = 'ts'
181181
)
182182
PARTITIONED BY (dt)
183183
AS SELECT * FROM parquet_table;
@@ -196,7 +196,7 @@ CREATE TABLE hudi_table_ctas
196196
USING hudi
197197
TBLPROPERTIES (
198198
type = 'cow',
199-
preCombineField = 'ts'
199+
orderingFields = 'ts'
200200
)
201201
AS SELECT * FROM parquet_table;
202202
```
@@ -579,10 +579,10 @@ Users can set table properties while creating a table. The important table prope
579579
|------------------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
580580
| type | cow | The table type to create. `type = 'cow'` creates a COPY-ON-WRITE table, while `type = 'mor'` creates a MERGE-ON-READ table. Same as `hoodie.datasource.write.table.type`. More details can be found [here](table_types.md) |
581581
| primaryKey | uuid | The primary key field names of the table separated by commas. Same as `hoodie.datasource.write.recordkey.field`. If this config is ignored, hudi will auto-generate primary keys. If explicitly set, primary key generation will honor user configuration. |
582-
| preCombineField | | The ordering field(s) of the table. It is used for resolving the final version of the record among multiple versions. Generally, `event time` or another similar column will be used for ordering purposes. Hudi will be able to handle out-of-order data using the ordering field value. |
582+
| orderingFields | | The ordering field(s) of the table. It is used for resolving the final version of the record among multiple versions. Generally, `event time` or another similar column will be used for ordering purposes. Hudi will be able to handle out-of-order data using the ordering field value. |
583583

584584
:::note
585-
`primaryKey`, `preCombineField`, and `type` and other properties are case-sensitive.
585+
`primaryKey`, `orderingFields`, and `type` and other properties are case-sensitive.
586586
:::
587587

588588
#### Passing Lock Providers for Concurrent Writers
@@ -833,7 +833,7 @@ WITH (
833833
'connector' = 'hudi',
834834
'path' = 'file:///tmp/hudi_table',
835835
'table.type' = 'MERGE_ON_READ',
836-
'precombine.field' = 'ts'
836+
'ordering.fields' = 'ts'
837837
);
838838
```
839839

website/docs/sql_dml.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ INSERT INTO hudi_cow_pt_tbl PARTITION(dt, hh) SELECT 1 AS id, 'a1' AS name, 1000
5151
:::note Mapping to write operations
5252
Hudi offers flexibility in choosing the underlying [write operation](write_operations.md) of a `INSERT INTO` statement using
5353
the `hoodie.spark.sql.insert.into.operation` configuration. Possible options include *"bulk_insert"* (large inserts), *"insert"* (with small file management),
54-
and *"upsert"* (with deduplication/merging). If ordering fields are not set, *"insert"* is chosen as the default. For a table with ordering fields set (via `preCombineField`),
54+
and *"upsert"* (with deduplication/merging). If ordering fields are not set, *"insert"* is chosen as the default. For a table with ordering fields set (via `orderingFields`),
5555
*"upsert"* is chosen as the default operation.
5656
:::
5757

@@ -101,7 +101,7 @@ update hudi_cow_pt_tbl set ts = 1001 where name = 'a1';
101101
```
102102

103103
:::info
104-
The `UPDATE` operation requires the specification of ordering fields (via `preCombineField`).
104+
The `UPDATE` operation requires the specification of ordering fields (via `orderingFields`).
105105
:::
106106

107107
### Merge Into
@@ -138,7 +138,7 @@ For a Hudi table with user configured primary keys, the join condition and the `
138138

139139
For a table where Hudi auto generates primary keys, the join condition in `MERGE INTO` can be on any arbitrary data columns.
140140

141-
if the `hoodie.record.merge.mode` is set to `EVENT_TIME_ORDERING`, ordering fields (via `preCombineField`) are required to be set with value in the `UPDATE`/`INSERT` clause.
141+
if the `hoodie.record.merge.mode` is set to `EVENT_TIME_ORDERING`, ordering fields (via `orderingFields`) are required to be set with value in the `UPDATE`/`INSERT` clause.
142142

143143
It is enforced that if the target table has primary key and partition key column, the source table counterparts must enforce the same data type accordingly. Plus, if the target table is configured with `hoodie.record.merge.mode` = `EVENT_TIME_ORDERING` where target table is expected to have valid ordering fields configuration, the source table counterpart must also have the same data type.
144144
:::
@@ -148,7 +148,7 @@ Examples below
148148
```sql
149149
-- source table using hudi for testing merging into non-partitioned table
150150
create table merge_source (id int, name string, price double, ts bigint) using hudi
151-
tblproperties (primaryKey = 'id', preCombineField = 'ts');
151+
tblproperties (primaryKey = 'id', orderingFields = 'ts');
152152
insert into merge_source values (1, "old_a1", 22.22, 900), (2, "new_a2", 33.33, 2000), (3, "new_a3", 44.44, 2000);
153153

154154
merge into hudi_mor_tbl as target
@@ -199,7 +199,7 @@ CREATE TABLE tableName (
199199
TBLPROPERTIES (
200200
type = 'mor',
201201
primaryKey = 'id',
202-
preCombineField = '_ts'
202+
orderingFields = '_ts'
203203
)
204204
LOCATION '/location/to/basePath';
205205

website/docs/sql_queries.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode (
210210
TBLPROPERTIES (
211211
type = 'mor',
212212
primaryKey = 'id',
213-
precombineField = 'ts',
213+
orderingFields = 'ts',
214214
recordMergeMode = 'EVENT_TIME_ORDERING'
215215
)
216216
LOCATION 'file:///tmp/hudi_table_merge_mode/';
@@ -225,7 +225,7 @@ INSERT INTO hudi_table_merge_mode VALUES (1, 'a1', 900, 20.0);
225225
SELECT id, name, ts, price FROM hudi_table_merge_mode;
226226
```
227227

228-
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `precombineField` ordering field) overwrites the record with the
228+
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `orderingFields`) overwrites the record with the
229229
smaller event time on the same key, regardless of transaction time.
230230

231231
### Snapshot Query with Custom Merge Mode
@@ -244,7 +244,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode_custom (
244244
TBLPROPERTIES (
245245
type = 'mor',
246246
primaryKey = 'id',
247-
precombineField = 'ts',
247+
orderingFields = 'ts',
248248
recordMergeMode = 'CUSTOM',
249249
'hoodie.datasource.write.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload'
250250
)

0 commit comments

Comments
 (0)