WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@Soddentrough
Copy link

ai-toolkit runs on systems with AMD GPUs but displays an error about 'nvidia-smi' in the dashboard when doing so.

This patch removes the hard-coded dependency on 'nvidia-smi' allowing ai-toolkit to operate with either 'nvidia-smi' or 'rocm-smi'. It first checks for 'nvidia-smi' and then checks for 'rocm-smi' which may cause an issue if both are installed but it solves a need today.

@dkspwndj
Copy link

dkspwndj commented Dec 7, 2025

rocmcommit01 rocmcommit02 Thank you for proceeding with the modification!! But it's not running properly.. I look forward to seeing good results in the future!!

@Soddentrough
Copy link
Author

image

Thinking this might be a path issue (where is your rocm-smi installed?) I will update the logic to check for rocm-smi in this order:

  1. which rocm-smi
    2, Check for $ROCM_PATH/bin/rocm-smi
  2. Check for /usr/bin/rocm-smi
  3. Check for /opt/rocm/bin/rocm-smi

@Soddentrough
Copy link
Author

image

There was a parsing issue when handling the JSON output of rocm-smi creating a phantom device.

@dkspwndj
Copy link

dkspwndj commented Dec 8, 2025

Forgive me.. I've been laying down the ROCm to the extent that I need the CompyUI with Zluda and then I figured out if the ROCm was properly laid. Now I'll make a separate PyTorch 3.12 folder to lay the ROCm and try it out there..

@dkspwndj
Copy link

dkspwndj commented Dec 8, 2025

Now tested ROCm 7.1.1 installed venv.
But.. not work well..
aitoookiterr

@Soddentrough
Copy link
Author

Soddentrough commented Dec 8, 2025

I'm sorry I didn't notice you're testing with a Windows system. rocm-smi is only available on Linux or WSL. Might be able to use hipinfo.exe on Windows to enumerate the devices but I don't think that has dynamic performance statistics for power/utilization/mem, so stats would show "0".

I don't currently have a way of testing this though and I think for Windows maybe using "Get-Counter" for dynamic performance counters could be the way to go.

@Soddentrough
Copy link
Author

image

This now uses amd-smi by default with fallback to rocm-smi. And where amd-smi doesn't fully support a GPU (eg: Strix iGPU) we use the sysfs hwmon metrics. This also allows us to show "VRAM" and "GTT" (shared memory) used by an APU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants