WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@siligam
Copy link
Contributor

@siligam siligam commented Aug 7, 2025

Time Bounds Feature Implementation

Overview

This PR introduces a robust time bounds handling system for xarray Datasets, which is particularly useful for climate and weather data where variables are often associated with time intervals rather than single time points.

Key Features

  • 🚀 Automatic detection and handling of time coordinates
  • 📅 Support for various time frequencies (daily, monthly, custom offsets)
  • ✅ Comprehensive input validation
  • 📊 Detailed logging with markdown-style formatting
  • 🧪 Extensive test coverage including edge cases

Changes

  1. Core Functionality:

    • Implemented time_bounds function in pymor/std_lib/time_bounds.py
    • Added support for both standard and offset time points
    • Included automatic bounds creation with proper dimension handling
  2. Testing:

    • Added unit tests for various scenarios
    • Included tests for monthly frequency with and without offsets
    • Added validation for error cases
  3. Documentation:

    • Created comprehensive documentation in doc/time_bounds.rst
    • Added usage examples and API reference
    • Documented logging format and error handling

Testing

All tests are passing:

  • Tested with different time frequencies
  • Verified edge cases (single time point, existing bounds)
  • Confirmed proper error handling

Logging

[Time bounds] dataset_name
  → is dataset:   ✅
  → time label  : time
  → bounds label: bnds
  → has time bounds: ❌
  → time values: 365 points from 2000-01-01 to 2000-12-31
  → time step: 1 days 00:00:00
  → set time bounds: time_bnds(365, 2)
  → bounds range: 2000-01-01 to 2001-01-01

Next Steps

Related Issues

Closes #176

@siligam siligam requested review from mandresm and pgierz August 7, 2025 14:28
code still needs refinement. This is first approximation.
@siligam siligam marked this pull request as draft August 7, 2025 21:37
@siligam
Copy link
Contributor Author

siligam commented Aug 7, 2025

It appears the time method (like mean, climatology, instantaneous) does influence in setting time bounds. For instance, when time method is climatology, time bounds are not to be set as it does not make sense in this case.

Also the time averaging step sets new time stamps which is either first or middle or last in the time interval. setting time bounds before this step or after this step may produce different results.

@pgierz pgierz self-assigned this Oct 20, 2025
@pgierz pgierz added the question Further information is requested label Nov 27, 2025
@pgierz pgierz changed the base branch from main to prep-release November 27, 2025 08:31
@pgierz
Copy link
Member

pgierz commented Nov 27, 2025

@siligam this is still marked as a draft, is there anything to be done here, or can I review this and merge? Is it still required for prep release?

@siligam siligam marked this pull request as ready for review November 27, 2025 09:03
@siligam
Copy link
Contributor Author

siligam commented Nov 27, 2025

@siligam this is still marked as a draft, is there anything to be done here, or can I review this and merge? Is it still required for prep release?

it is ready for review, although the linting thing needs to be fixed.

- Remove unused loop variable 'i' in time_bounds.py
- Apply black formatting to time_bounds.py and test_time_bounds.py
- All linting checks (flake8, isort, black) now pass
Copy link
Member

@pgierz pgierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments, I think we need to iterate on this PR one more time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file belongs in the documentation folder, together with the time_bounds.rst and should be merged into one single document

The output dataset with the time bounds information.
"""
# Get dataset name for logging
dataset_name = ds.attrs.get("name", "unnamed_dataset")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in a couple of places. I think we should be consistent and think of a rule for it. I am ok with "unnamed_dataset", but we should keep it in mind for the future.


# Log header with markdown style
logger.info(f"[Time bounds] {dataset_name}")
logger.info(f" → is dataset: {'✅' if isinstance(ds, xr.Dataset) else '❌'}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in a few places now, and I am not sure I like the emojis so much. What about @mandresm, any opinions?

Comment on lines +80 to +83
# Only create bounds for mean and instantaneous methods
if time_method == "climatology":
logger.info(" → skipping bounds creation for climatology data")
return ds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we double check this? Do climatologically averaged datasets really not include time bounds? That seems inconsistent.

Comment on lines 74 to 78
if time_method not in ["mean", "instantaneous", "climatology"]:
logger.warning(
f" ⚠️ Unknown time method '{time_method}', defaulting to 'mean'"
)
time_method = "mean"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a way to bail out here and raise a failure would be nice.

Comment on lines +103 to +104
# For instantaneous data, bounds are the same as time values
bounds_data = np.column_stack([time_values, time_values])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder of what np.column_stack does:

Suggested change
# For instantaneous data, bounds are the same as time values
bounds_data = np.column_stack([time_values, time_values])
# For instantaneous data, bounds are the same as time values
#
# .. note:: np.column_stack reminder
#
# >>> a = np.array([1, 2, 3])
# >>> b = np.array([4, 5, 6])
# >>> np.column_stack((a, b))
# [[1, 4], [2, 5], [6, 3]]
#
bounds_data = np.column_stack([time_values, time_values])

raise ValueError(error_msg)

# Calculate data frequency in days
time_diff_seconds = np.median(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why median?

coords={time_label: time_values, bounds_dim_label: [0, 1]},
attrs={
"long_name": f"time bounds for {time_label}",
"comment": f"Generated by pymorize: {time_method} time bounds",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong name!

@pgierz pgierz added enhancement New feature or request and removed question Further information is requested labels Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

create time bounds after time averaging

3 participants