In the CUDA 12.9 cuBLASLt documentation, I noticed support for 1×128 and 128×128 block-wise quantization methods. However, I found that nvmath-python currently lacks bindings for this type of quantize approach. I wonder do we have any plan for support this approach?
https://docs.nvidia.com/cuda/cublas/index.html#cublasltmatmulmatrixscale-t