WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Releases: NVIDIA/TorchFort

v0.4.0

24 Nov 18:29
42a36c9

Choose a tag to compare

What's Changed

This release includes several new features and bug fixes and relicenses TorchFort to an Apache 2.0 license. Support for gradient clipping and user control of the built-in MLP flattening behavior has been added, along with extending the array dimensions supported by the Fortran supervised and RL training interfaces. Several inconsistencies and bugs in the RL algorithms have been addressed, as well as proper handling of gradient accumulation. Supported tensor dimensions of the built-in MLP model have been made more strict, and it no longer supports multiple input/output.

Breaking changes

#76 explicitly disables multiple input tensors for the built-in MLP mode, making it no longer suitable for use as the critic network in RL training. Replace it with the new CriticMLP model or your own model in PyTorch/TorchScript.

Deprecations

None.

Notable PRs included in this release

  • Fixing predefined RL models, adds new CriticMLP for RL (#77)
  • Explicitly disallow multiple input tensors from built-in MLP model. (#76)
  • Expand Fortran supervised training interface coverage to more input/label and input/output dimension combinations (#79)
  • Update to Apache 2.0 license. (#80)
  • Add distributed model/system creation routines using Fortran integer MPI_Comm format. Simplify corresponding Fortran interfaces. (#81)
  • Add parameter to control flattening behavior of built-in MLP model. (#78)
  • Add build configured torchfort_config.h header to indicate GPU support in build. (#83)
  • Fixes to built-in ActorCriticMLP. Adding 1D Fortran array interfaces for RL. (#82)
  • Use p_model_ instead of p_model_target_ in predict td3 (#87)
  • Fixing device placements in SAC RL training (#86)
  • Fixes to action squashing in RL training. (#88)
  • Making SAC RL training more consistent with spinningup reference (#90)
  • Fixing gradient accumulation in RL algorithms. (#93)
  • Add support for gradient clipping. (#94)

Full Changelog: v0.3.1...v0.4.0

v0.3.1

15 Sep 20:14
909a67c

Choose a tag to compare

What's Changed

This version of TorchFort contains a few minor updates and bug fixes. This version addresses some deficiencies in the existing Fortran interfaces, adds 64-bit integer handling to the NCCL/MPI interfaces used for distributed training and fixes a bug where the extra loss argument tensor for custom loss functions was not automatically moved to the model device. Other minor improvements are removing the dependency on HDF5 in the Fortran examples to simplify builds and adding a missing compiler define causing the cart pole example to unconditionally run on CPU.

Breaking changes

None.

Deprecations

None.

Notable PRs included in this release

  • Fortran interface fixes (#56)
  • adding kLong support to TorchFort MPI and NCCL wrappers (#58, #59)
  • Add missing ENABLE_GPU define to cart pole example build. (#62)
  • Add missing calls to move loss module and extra_loss_args to model device. (#63)
  • Removing HDF5 dependency from examples to simplify builds. (#66)

Full Changelog: v0.3.0...v0.3.1

Multi Tensor, Multi Environment Support, Modernization of Dependencies

03 Jun 14:09
3ac715f

Choose a tag to compare

Summary Release Notes
Major Features and Enhancements

  1. Multi-Argument Model and Loss Support
    • Added full support for models and loss functions that require multiple input, label, and output tensors, as well as custom loss arguments. This is enabled via new torchfort_train_multiarg and torchfort_inference_multiarg APIs, with corresponding Fortran and C documentation and usage examples.
    • Introduced torchfort_tensor_list types and management functions (create, destroy, add_tensor) to facilitate passing multiple tensors to models and losses.
    • Expanded documentation and provided a comprehensive Fortran example (examples/fortran/graph) demonstrating online training on unstructured meshes with a MeshGraphNet-like model and a custom PyTorch loss function exported via TorchScript.
  2. TorchScript Loss Functions
    • Added support for loading custom loss functions from exported TorchScript modules via a new torchscript loss type. This allows users to implement arbitrary loss logic in Python and integrate it into TorchFort workflows.
    • Updated configuration and documentation to describe usage and options for TorchScript-based losses.
  3. Expanded Documentation and Examples
    • Significantly updated API and usage documentation to cover the new multi-argument interfaces, tensor list management, and custom loss workflows.
    • Added a detailed, reproducible example (examples/fortran/graph) including all necessary mesh data, configuration, model/loss generation scripts, and visualization tools.
    Core and API Changes
  4. Loss Function API Refactor
    • Refactored the internal loss interface: loss functions now accept an additional extra_args argument, supporting more flexible and extensible loss computations.
    • Implemented new TorchscriptLoss class for TorchScript integration, and updated the loss registry accordingly.
  5. Distributed and RL Improvements
    • Reinforcement learning (RL) off-policy and on-policy buffers now support local multi-environment updates, with new APIs and documentation for batch buffer operations.
    • Improved distributed communication routines to enforce tensor contiguity, with clear error messages for unsupported non-contiguous tensors.
  6. Grad Accumulation and Training Control
    • Added support for gradient accumulation steps, configurable via the optimizer general block in the YAML config. This enables larger effective batch sizes and more control over optimization steps.
    • RL algorithms and model training logic now respect the new gradient accumulation setting, only stepping the optimizer after the configured number of accumulation steps.
    Build and Environment Updates
  7. Updated Build and Dependency Stack
    • Dockerfiles and build scripts updated to use CUDA 12.8, NVIDIA HPC SDK 25.3, latest OpenMPI/HPC-X, and PyTorch 2.7.0 for improved performance and compatibility.
    • Default C++ ABI flag switched to -D_GLIBCXX_USE_CXX11_ABI=1 for all builds, to accomodate updated PyTorch version.
    • Requirements updated to match new PyTorch and torchvision/torchaudio versions.
  8. Improved Compiler and MPI Compatibility
    • CMake logic now detects and blocks unsupported compilers (e.g., nvc++ for C++ code), with clear error messages.
    • Fortran MPI compatibility is now tested at build time, and the build system automatically sets the MPICH flag if required.
    Other Notable Improvements
    • Various bugfixes and enhancements to distributed communication, RL API, and internal error handling.
    • Expanded and clarified documentation throughout the API and example codebases.
    Upgrade Notes:
    • Users should update their Docker images or environments to the new CUDA, HPC SDK, and PyTorch versions.
    • When using custom loss functions or multi-input models, refer to the new documentation and examples for correct API usage.
    • YAML configuration files may require updates to optimizer and loss sections to leverage new features.

v0.2.0

06 Sep 17:48
30b5c6f

Choose a tag to compare

v0.2.0 Pre-release
Pre-release

What's Changed

This release includes several major updates to TorchFort, including:

  • Enabling compilation of library with alternative compilers to NVHPC (e.g., GNU)
  • Enabling model/RL system training and inference on CPU
  • Enabling CPU-only builds without CUDA/NCCL
  • New reinforcement learning features, including PPO algorithms and on-policy algorithms
  • Improvements to build scripts

Breaking Changes

#14 enables placing and running models/RL systems on CPU. To enable this, an additional device argument was added to the model/system creation APIs (e.g., torchfort_model_create). Please refer to the documentation for more details.

PRs included in this release

  • Enable support for complex gradient reduction in distributed cases. (#2)
  • remove extraneous mpi call from cmake (#4)
  • generalize cmake to build for different cuda archs (#7)
  • remove hardcoded yaml-cpp path from CMakeLists.txt (#5)
  • Build updates and improvements (#10)
  • Update setup.cpp (#11)
  • merging rl changes ( #13)
  • Enable model training/inference on CPU or GPU devices. Enabling usage of alternative compilers to NVHPC, (#14)
  • Tkurth/rl ppo (#12)
  • Tkurth/rl tests (#15)
  • Add train and inference functions for 5d Fortran arrays. (#17)
  • Fix interface issues in Fortran module with gfortran. (#18)
  • Enable builds without CUDA/GPU support (#19)
  • Fixing up documentation. (#20)
  • 0.2.0 release (#21)

Full Changelog: v0.1.0...v0.2.0

v0.1.0

01 Aug 21:55

Choose a tag to compare

v0.1.0 Pre-release
Pre-release

Initial release of TorchFort.