WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Proposal: Support Zarr Spatial and Geo-Projection Conventions in rioxarray #882

@emmanuelmathot

Description

@emmanuelmathot

Summary

Add support for the new Zarr spatial: convention and Zarr proj: convention to enable rioxarray to read and write CRS and georeferencing metadata using these emerging cloud-native standards. The implementation prioritizes Zarr conventions over CF conventions for performance (direct attribute access vs coordinate variable access), while maintaining full backward compatibility with existing CF conventions.

Motivation

Why These Conventions Matter

The Zarr ecosystem has recently defined two complementary conventions for geospatial data:

  1. proj: convention - Defines Coordinate Reference Systems using:

    • proj:code (e.g., "EPSG:4326")
    • proj:wkt2 (WKT2/ISO 19162 format)
    • proj:projjson (PROJJSON object format)
  2. spatial: convention - Defines spatial coordinate transformations using:

    • spatial:transform - Affine transformation as numeric array [a, b, c, d, e, f]
    • spatial:dimensions - Maps array dimensions to spatial axes (e.g., ["y", "x"])
    • spatial:bbox - Bounding box [xmin, ymin, xmax, ymax]
    • spatial:shape - Spatial dimensions [height, width]
    • spatial:registration - Grid registration type ("pixel" or "node")

Key Benefits

Modular Design: Following the rasterio principle that georeferencing has two parts, (1) defining the coordinate system, and (2) transforming pixel coordinates, these conventions cleanly separate CRS (proj:) from transformations (spatial:).

Cloud-Native: Simple attribute-based metadata that works naturally with Zarr's distributed storage model, without requiring CF grid_mapping coordinates.

Broad Applicability: The spatial: convention is useful beyond geospatial data (microscopy, medical imaging), while proj: focuses specifically on CRS definitions.

Current rioxarray Implementation

CRS Detection (Priority Chain)

Located in rioxarray/rioxarray.py:305-338:

  1. WKT attributes (spatial_ref, crs_wkt) on grid_mapping coordinate
  2. CF grid_mapping attributes via pyproj.CRS.from_cf()
  3. Dataset-level crs attribute

Transform Detection

Located in rioxarray/rioxarray.py:613-632:

  1. Grid_mapping GeoTransform attribute (GDAL space-separated string)
  2. Dataset-level transform attribute (Affine tuple)

Key Difference from Zarr Conventions

Aspect Current (CF) New (Zarr)
CRS Location Grid_mapping coordinate attributes Direct array/group attributes
CRS Format WKT + CF attributes proj:code, proj:wkt2, proj:projjson
Transform Format GDAL string: "a b c d e f" Numeric array: [a, b, c, d, e, f]
Transform Location Grid_mapping coordinate GeoTransform Array attribute spatial:transform
Inheritance None (per-array grid_mapping) Group-level proj: inherits to child arrays

Proposed Implementation

Design Principles

  1. Performance First: Prioritize lightweight Zarr conventions over CF grid_mapping coordinate access that requires opening a new data array
  2. Backward Compatibility: All existing CF convention support remains functional as fallback
  3. Additive Changes Only: New functionality added without breaking existing APIs
  4. Convention Coexistence: Both CF and Zarr conventions can exist simultaneously
  5. Smart Priority Order: Check cheap attribute-based Zarr conventions before CF grid_mapping
  6. Explicit Writing: Provide dedicated methods for writing Zarr conventions
  7. Convention Validation: Only interpret convention attributes when explicitly declared in zarr_conventions array

Convention Declaration Requirement

According to the Zarr conventions specification, conventions must be explicitly declared in the zarr_conventions array before their attributes are interpreted. This prevents attribute name collisions and ensures clear intent.

Convention Identifiers:

The proj: convention:

{
  "schema_url": "https://raw.githubusercontent.com/zarr-experimental/geo-proj/refs/tags/v1/schema.json",
  "spec_url": "https://github.com/zarr-experimental/geo-proj/blob/v1/README.md",
  "uuid": "f17cb550-5864-4468-aeb7-f3180cfb622f",
  "name": "proj:",
  "description": "Coordinate reference system information for geospatial data"
}

The spatial: convention:

{
  "schema_url": "https://raw.githubusercontent.com/zarr-conventions/spatial/refs/tags/v1/schema.json",
  "spec_url": "https://github.com/zarr-conventions/spatial/blob/v1/README.md",
  "uuid": "689b58e2-cf7b-45e0-9fff-9cfc0883d6b4",
  "name": "spatial:",
  "description": "Spatial coordinate information"
}

Implementation:

  • Reading: Check if convention is declared in zarr_conventions before interpreting attributes
  • Writing: Automatically add convention declaration when writing convention attributes
  • Validation: Use has_convention_declared() utility function to check declarations

Change 1: Reading Support

1.1 Enhance CRS Detection

File: rioxarray/rioxarray.py
Method: crs property

Reorder the priority chain to check lightweight Zarr conventions before costly CF grid_mapping:

# New priority chain (optimized for performance):
1. Zarr proj:wkt2 attribute (array level)                  [NEW]
2. Zarr proj:code attribute (array level)                  [NEW]
3. Zarr proj:projjson attribute (array level)              [NEW]
4. Zarr proj:* attributes at group level (for Datasets)    [NEW]
5. WKT attributes (spatial_ref, crs_wkt) on grid_mapping  [UNCHANGED]
6. CF grid_mapping attributes                              [UNCHANGED]
7. Dataset crs attribute                                   [UNCHANGED]

Rationale:

  • Performance: Zarr conventions use direct array/group attributes which are already in the data to access
  • CF Cost: CF grid_mapping requires accessing a separate coordinate variable (self._obj.coords[self.grid_mapping]), which takes another data request
  • Backward Compatibility: Existing datasets with only CF conventions continue to work via fallback
  • Best Practice: Prefer the simpler, more performant convention when both are present

1.2 Enhance Transform Detection

File: rioxarray/rioxarray.py
Method: _cached_transform()

Reorder to check lightweight Zarr spatial:transform before grid_mapping access:

# New priority chain (optimized for performance):
1. Array-level spatial:transform attribute                 [NEW ]
2. Group-level spatial:transform attribute (Datasets)      [NEW]
3. Grid_mapping GeoTransform attribute                     [UNCHANGED]
4. Dataset transform attribute                             [UNCHANGED]

Rationale:

  • Performance: spatial:transform is a direct attribute access (self._obj.attrs["spatial:transform"])
  • CF Cost: GeoTransform requires grid_mapping coordinate access (self._obj.coords[self.grid_mapping].attrs["GeoTransform"])
  • Key conversion: Parse spatial:transform as [a, b, c, d, e, f]Affine(a, b, c, d, e, f)

1.3 Enhance Spatial Dimensions Detection

File: rioxarray/rioxarray.py
Method: __init__()

Reorder to check Zarr spatial:dimensions before standard dimension name patterns:

# New priority chain (optimized for performance):
1. Zarr spatial:dimensions attribute                       [NEW]
2. Standard dimension names (x/y, longitude/latitude)      [UNCHANGED]
3. CF coordinate attributes (axis, standard_name)          [UNCHANGED]

Change 2: Writing Support

Create new methods for writing Zarr conventions while maintaining existing CF writing methods.

2.1 New Method: write_zarr_crs()

File: rioxarray/rioxarray.py
Location: After write_crs() method

def write_zarr_crs(
    self,
    input_crs: Optional[Any] = None,
    format: Literal["code", "wkt2", "projjson", "all"] = "code",
    inplace: bool = False,
) -> Union[xarray.Dataset, xarray.DataArray]:
    """
    Write CRS using Zarr proj: convention.

    Parameters
    ----------
    input_crs : Any
        Anything accepted by rasterio.crs.CRS.from_user_input
    format : {"code", "wkt2", "projjson", "all"}
        Which proj: format(s) to write
    inplace : bool
        If True, write to existing dataset
    """

2.2 New Method: write_zarr_transform()

File: rioxarray/rioxarray.py
Location: After write_transform() method

def write_zarr_transform(
    self,
    transform: Optional[Affine] = None,
    inplace: bool = False,
) -> Union[xarray.Dataset, xarray.DataArray]:
    """
    Write transform using Zarr spatial:transform convention.

    Converts Affine to numeric array [a, b, c, d, e, f].
    """

2.3 New Method: write_zarr_spatial_metadata()

File: rioxarray/rioxarray.py

def write_zarr_spatial_metadata(
    self,
    inplace: bool = False,
    include_bbox: bool = True,
    include_registration: bool = True,
) -> Union[xarray.Dataset, xarray.DataArray]:
    """
    Write complete Zarr spatial: metadata.

    Writes spatial:dimensions, spatial:shape, and optionally
    spatial:bbox and spatial:registration.
    """

2.4 New Method: write_zarr_conventions() (Convenience)

All-in-one method for writing both proj: and spatial: conventions together.

Change 3: Utility Module

3.1 Create New Module: zarr_conventions.py

File: rioxarray/zarr_conventions.py (NEW)

Utility functions for parsing and formatting Zarr convention attributes:

# Transform conversion
def parse_spatial_transform(spatial_transform) -> Affine
def format_spatial_transform(affine: Affine) -> list

# CRS parsing
def parse_proj_code(proj_code: str) -> CRS
def parse_proj_wkt2(proj_wkt2: str) -> CRS
def parse_proj_projjson(proj_projjson: dict) -> CRS

# CRS formatting
def format_proj_code(crs: CRS) -> Optional[str]
def format_proj_wkt2(crs: CRS) -> str
def format_proj_projjson(crs: CRS) -> dict

# Utilities
def calculate_spatial_bbox(transform: Affine, shape: tuple) -> tuple
def validate_spatial_registration(registration: str) -> None

# Convention declaration and validation
PROJ_CONVENTION: dict  # Convention identifier constants
SPATIAL_CONVENTION: dict

def has_convention_declared(attrs: dict, convention_name: str) -> bool
def get_declared_conventions(attrs: dict) -> set
def add_convention_declaration(attrs: dict, convention_name: str, inplace: bool = False) -> dict

3.2 Enhance CRS Utilities

File: rioxarray/crs.py
Method: crs_from_user_input()

Add support for PROJJSON dictionary input to handle proj:projjson format.

Change 4: Testing

4.1 New Test File

File: test/integration/test_integration_zarr_conventions.py (NEW)

Comprehensive test coverage including:

Reading Tests:

  • Read CRS from proj:code, proj:wkt2, proj:projjson
  • Read transform from spatial:transform
  • Read dimensions from spatial:dimensions
  • Verify Zarr conventions take priority over CF conventions (performance optimization)
  • Verify CF conventions work as fallback when Zarr conventions absent
  • Test group-level inheritance for Datasets
  • Performance comparison: Zarr vs CF convention access times

Writing Tests:

  • Write CRS in each proj: format
  • Write transform as spatial:transform
  • Write complete spatial metadata
  • Write all conventions together

Round-trip Tests:

  • Write Zarr conventions, read back, verify integrity
  • Ensure both CF and Zarr conventions can coexist

Edge Cases:

  • Invalid attribute formats
  • Missing required attributes
  • Conflicting convention values
  • Group vs array level precedence

4.2 Integration with Existing Tests

Verify that all existing tests pass unchanged, confirming backward compatibility.

Implementation Approach

Backward Compatibility Strategy

  1. Zero Breaking Changes: All existing code continues to work identically
  2. Smart Prioritization: Lightweight Zarr conventions checked before expensive CF grid_mapping access
  3. Fallback Support: CF conventions remain fully supported as fallback when Zarr conventions absent
  4. Opt-in Writing: Users must explicitly call write_zarr_*() methods to use new conventions
  5. Default Behavior: All existing write_crs() and write_transform() methods unchanged
  6. Coexistence: Both convention types can be present simultaneously

Performance Impact:

  • With Zarr conventions only: Faster (direct attribute access)
  • With CF conventions only: Same performance as before (fallback path)
  • With both conventions: Faster (Zarr conventions found first, CF skipped)
  • No conventions: Same as before (all checks fail gracefully)

Group-Level Inheritance

For xarray.Dataset objects, support the Zarr proj: inheritance model:

  • Check array-level proj:* attributes first
  • Fall back to parent Dataset-level proj:* attributes
  • Only applies to direct children (no multi-level inheritance)
  • Arrays can override group-level CRS definitions

Registration Handling

The spatial:registration attribute distinguishes between:

  • "pixel" (default): Cell boundaries at coordinates (GeoTIFF PixelIsArea)
  • "node": Cells centered on coordinates (GeoTIFF PixelIsPoint)

Initially, we'll read and preserve this attribute but maintain existing rioxarray behavior (pixel registration). Future enhancement could add registration-aware coordinate generation.

References

cc @maxrjones, @vincentsarago, @d-v-b, @rabernat

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalIdea for a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions