WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@dmitsh
Copy link
Collaborator

@dmitsh dmitsh commented Dec 4, 2025

No description provided.

@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.84%. Comparing base (7975e25) to head (5a92c6f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/providers/dra/provider.go 0.00% 5 Missing ⚠️
pkg/translate/topology.go 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #189      +/-   ##
==========================================
- Coverage   64.90%   64.84%   -0.07%     
==========================================
  Files          78       78              
  Lines        4266     4275       +9     
==========================================
+ Hits         2769     2772       +3     
- Misses       1390     1395       +5     
- Partials      107      108       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@greptile-apps
Copy link

greptile-apps bot commented Dec 4, 2025

Greptile Overview

Greptile Summary

Added error handling to return clear error messages when the DRA provider finds no matching nodes with the required labels and annotations. The PR also refactored the retry loop and added defensive logging for empty block topology data.

Key Changes:

  • Returns StatusBadGateway error when domainMap is empty, with helpful message indicating which label and annotations to check
  • Extracted nvidia.com/gpu.clique label into a constant for better maintainability
  • Added warning logs when block topology data is missing or empty
  • Refactored retry loop structure (note: this reduces retry attempts from 6 to 5)

Confidence Score: 4/5

  • This PR is safe to merge with one minor behavior change to be aware of
  • The error handling improvements are well-implemented and the code is clean. However, the retry loop refactoring reduces the number of attempts from 6 to 5, which is a subtle behavior change that was already flagged in a previous review comment
  • Pay attention to pkg/server/engine.go - the retry loop now makes 5 attempts instead of 6

Important Files Changed

File Analysis

Filename Score Overview
pkg/providers/dra/provider.go 5/5 Added error handling for empty domain map and extracted label constant; changes are clean and improve error reporting
pkg/server/engine.go 3/5 Refactored retry loop reducing total attempts from 6 to 5; logic is functionally correct but represents a behavior change
pkg/translate/topology.go 5/5 Added defensive warnings for missing or empty block topology data; improves observability without changing behavior

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server
    participant Provider as DRA Provider
    participant K8s as Kubernetes API
    
    Client->>Server: Request Topology
    Server->>Server: processRequestWithRetries(maxRetries=5)
    
    loop Retry attempts (1-5)
        Server->>Provider: GenerateTopologyConfig()
        Provider->>K8s: GetNodes()
        K8s-->>Provider: Return nodes
        
        Provider->>Provider: Filter nodes by label and annotations
        Provider->>Provider: Build domainMap
        
        alt domainMap is empty
            Provider-->>Server: Error: no matching nodes found
            Server->>Server: Check if StatusInternalServerError
            alt attempt < maxRetries
                Server->>Server: Exponential backoff
            else attempt == maxRetries
                Server-->>Client: Return error
            end
        else domainMap has entries
            Provider->>Provider: toGraph(domainMap)
            Provider-->>Server: Return topology
            Server-->>Client: Success response
        end
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 48 to 62
attempt := 0
for {
var code int
attempt++
start := time.Now()

ret, err := f(tr)
if err != nil {
code = err.Code()
} else {
code = http.StatusOK
}
metrics.AddTopologyRequest(tr.Provider.Name, tr.Engine.Name, code, time.Since(start))

if code != http.StatusInternalServerError || attempt == maxRetries {
if code != http.StatusInternalServerError || attempt >= maxRetries {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The refactored retry loop now makes 5 attempts instead of the original 6. Old code: for attempt := 0; attempt <= maxRetries made attempts 0-5 (6 total). New code: attempt++ then attempt >= maxRetries makes attempts 1-5 (5 total).

Suggested change
attempt := 0
for {
attempt++
var code int
start := time.Now()
ret, err := f(tr)
if err != nil {
code = err.Code()
} else {
code = http.StatusOK
}
metrics.AddTopologyRequest(tr.Provider.Name, tr.Engine.Name, code, time.Since(start))
if code != http.StatusInternalServerError || attempt > maxRetries {
return ret, err
}

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dmitsh dmitsh merged commit 9d3e3e3 into main Dec 4, 2025
7 checks passed
@dmitsh dmitsh deleted the ds-logs branch December 4, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants