-
-
Notifications
You must be signed in to change notification settings - Fork 971
Separate backoff strategies for peer registration vs login operations #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Separate backoff strategies for peer registration vs login operations #4835
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds grouped retry/backoff constants and context-aware backoff factories; classifies LoginRequest as registration when Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as Caller
participant GRPC as management gRPC client
participant Detector as isRegistrationRequest
participant Selector as Backoff selector
participant Backoff as Backoff engine
Caller->>GRPC: Login(req)
GRPC->>Detector: isRegistrationRequest(req)?
alt registration
Detector-->>Selector: registration
Selector->>Backoff: newRegistrationBackoff(ctx)
else login
Detector-->>Selector: login
Selector->>Backoff: newLoginBackoff(ctx)
end
Backoff->>GRPC: invoke Login call with retries
alt error == Canceled
Note right of GRPC: log Canceled, return permanent error
else error == DeadlineExceeded
Note right of GRPC: debug log, allow retry
else other errors
Note right of GRPC: treat as permanent error
end
GRPC-->>Caller: response or final error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR separates backoff strategies for peer registration and login operations by introducing distinct timeout configurations. Previously, both operations used a single 10-second backoff timeout, but registration operations (involving database writes and IdP validation) require longer timeouts than login operations.
Key changes:
- Auto-detects operation type by checking for SetupKey or JwtToken in the request
- Implements separate backoff configurations: 180s timeout for registration (5s initial, 30s max interval) vs 45s for login (3s initial, 15s max interval)
- Adds helper functions for creating backoff strategies while maintaining backward compatibility
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if s.Code() == codes.DeadlineExceeded { | ||
| log.Debugf("login timeout, retrying...") | ||
| return err | ||
| } |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message 'login timeout, retrying...' doesn't distinguish between registration and login operations. Since these now have different timeout configurations, the message should indicate which operation timed out to aid debugging. Consider using a message like 'operation timeout, retrying...' or including the operation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
shared/management/client/grpc.go (3)
325-349: Consider usingNewExponentialBackOff+ opts instead of struct literalsThe two helpers mirror
defaultBackoffby instantiatingExponentialBackOffvia struct literal and passing it directly toWithContext. The backoff/v4 docs state thatResetshould be called before using anExponentialBackOff; usingNewExponentialBackOffwithWithInitialInterval/WithMaxInterval/WithMaxElapsedTime(or explicitly callingReset()once after construction) better aligns with that contract and is more robust to internal field changes. (pkg.go.dev)You could keep the same constants but build the policies like this:
-func newRegistrationBackoff(ctx context.Context) backoff.BackOff { - return backoff.WithContext(&backoff.ExponentialBackOff{ - InitialInterval: RegistrationRetryInterval, - RandomizationFactor: backoff.DefaultRandomizationFactor, - Multiplier: backoff.DefaultMultiplier, - MaxInterval: RegistrationRetryMaxDelay, - MaxElapsedTime: RegistrationRetryTimeout, - Stop: backoff.Stop, - Clock: backoff.SystemClock, - }, ctx) -} +func newRegistrationBackoff(ctx context.Context) backoff.BackOff { + b := backoff.NewExponentialBackOff( + backoff.WithInitialInterval(RegistrationRetryInterval), + backoff.WithMaxInterval(RegistrationRetryMaxDelay), + backoff.WithMaxElapsedTime(RegistrationRetryTimeout), + ) + return backoff.WithContext(b, ctx) +} @@ -func newLoginBackoff(ctx context.Context) backoff.BackOff { - return backoff.WithContext(&backoff.ExponentialBackOff{ - InitialInterval: LoginRetryInterval, - RandomizationFactor: backoff.DefaultRandomizationFactor, - Multiplier: backoff.DefaultMultiplier, - MaxInterval: LoginRetryMaxDelay, - MaxElapsedTime: LoginRetryTimeout, - Stop: backoff.Stop, - Clock: backoff.SystemClock, - }, ctx) -} +func newLoginBackoff(ctx context.Context) backoff.BackOff { + b := backoff.NewExponentialBackOff( + backoff.WithInitialInterval(LoginRetryInterval), + backoff.WithMaxInterval(LoginRetryMaxDelay), + backoff.WithMaxElapsedTime(LoginRetryTimeout), + ) + return backoff.WithContext(b, ctx) +}This keeps the behavior but relies on the library’s initialization path rather than on its struct layout.
351-355: Registration detection viaSetupKey/JwtTokenis clear but becomes the contractUsing
SetupKeyorJwtTokenpresence to classify aLoginRequestas a registration matches howRegistervsLoginpopulate the request, so the split backoff will behave as expected for current call sites. Just be aware this effectively codifies “any flow that sets eitherSetupKeyorJwtTokenis treated as registration” — future callers that happen to populate those fields will automatically inherit the longer registration retry window.
379-386: Clarify desired retry behavior forCanceledand use of parent context inloginInside
operation, bothcodes.Canceledandcodes.DeadlineExceededare returned as plain errors, sobackoff.Retrywill keep retrying them until the chosen backoff (registration or login) stops orc.ctxis canceled; all other status codes and non‑gRPC errors become permanent viabackoff.Permanent(err). If the intent is forCanceled(e.g., user‑driven cancellation) to stop retries, you may want to wrap it withbackoff.Permanent(err)as well or gate onc.ctxexplicitly before retrying.Also,
mgmCtxis still derived fromcontext.Background(), so cancelingc.ctxstops scheduling new retries but does not abort an in‑flightLogincall until its ownConnectTimeoutelapses. If you want immediate cancellation behavior on shutdown, consider usingc.ctxas the parent here:- mgmCtx, cancel := context.WithTimeout(context.Background(), ConnectTimeout) + mgmCtx, cancel := context.WithTimeout(c.ctx, ConnectTimeout)That would tie per‑RPC cancellation to the client’s lifecycle while still respecting the new backoff windows.
Also applies to: 394-402
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: Windows
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
28-38: Timeout split between registration and login looks reasonable; verify overall UXThe separate retry windows (180s for registration, 45s for login) align with the stated goals of “slow to give up on registration, fast to fail login”. Just double‑check in practice that, combined with the 10s per‑RPC
ConnectTimeoutand exponential backoff, the worst‑case wall‑clock behavior still fits CLI/daemon expectations (e.g., not surprising users with overly long waits on repeated failures).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
379-402: LGTM! Correct error handling and backoff selection logic.The enhanced error handling properly distinguishes between retryable errors (Canceled, DeadlineExceeded) and permanent errors. The conditional backoff strategy selection based on request type is correctly implemented.
The context usage pattern is well-designed: using
c.ctxfor the backoff respects the client's lifecycle, while usingcontext.Background()for individual retry attempts ensures each gets a fresh timeout.Optional improvement: The log message at line 384 could indicate whether it's a registration or login timeout to aid debugging, as noted in the previous review. However, this is a minor observability enhancement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: Client / Unit
- GitHub Check: Android / Build
- GitHub Check: Linux
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Darwin
- GitHub Check: JS / Lint
🔇 Additional comments (4)
shared/management/client/grpc.go (4)
28-38: LGTM! Well-organized backoff constants.The separate backoff strategies are appropriate: registration operations get more generous timeouts (180s total, 5s initial, 30s max) compared to login operations (45s total, 3s initial, 15s max). The grouping in a single const block improves readability.
325-336: LGTM! Correct backoff implementation.The registration backoff helper is properly configured with exponential backoff parameters and context integration for cancellation support.
338-349: LGTM! Correct backoff implementation.The login backoff helper is properly configured with exponential backoff parameters and context integration for cancellation support.
351-355: LGTM! Detection logic is correct.The function correctly identifies registration requests by checking for either SetupKey or JwtToken presence. The comment clarification from the previous review has been addressed appropriately.
4b04d0a to
f50db3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
shared/management/client/grpc.go (1)
383-386: Enhance log message to distinguish operation type.Since registration and login now have different timeout configurations (180s vs 45s), the debug message should indicate which operation timed out to aid troubleshooting.
Apply this diff:
+ operationType := "login" + if isRegistrationRequest(req) { + operationType = "registration" + } if s.Code() == codes.DeadlineExceeded { - log.Debugf("login timeout, retrying...") + log.Debugf("%s timeout, retrying...", operationType) return err }
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
28-38: Consider adding documentation for retry constants.While the constant names are self-explanatory, brief comments explaining the rationale for these specific timeout values would improve maintainability. For example, why registration needs 180s vs login's 45s (database writes, external IdP validation, etc.).
Example:
const ( ConnectTimeout = 10 * time.Second + // Registration retry constants - longer timeouts to accommodate database writes, + // external IdP validation, and group calculations RegistrationRetryTimeout = 180 * time.Second RegistrationRetryInterval = 5 * time.Second RegistrationRetryMaxDelay = 30 * time.Second + // Login retry constants - shorter timeouts for faster failure feedback LoginRetryTimeout = 45 * time.Second LoginRetryInterval = 3 * time.Second LoginRetryMaxDelay = 15 * time.Second )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (1)
client/proto/daemon.pb.go (3)
LoginRequest(234-292)LoginRequest(305-305)LoginRequest(320-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: Management / Unit (amd64, mysql)
- GitHub Check: Relay / Unit (amd64, -race)
- GitHub Check: Management / Unit (amd64, postgres)
- GitHub Check: Management / Unit (amd64, sqlite)
- GitHub Check: Relay / Unit (386)
- GitHub Check: Signal / Unit (386)
- GitHub Check: Client / Unit (386)
- GitHub Check: Client (Docker) / Unit
- GitHub Check: Client / Unit (amd64)
- GitHub Check: Android / Build
- GitHub Check: iOS / Build
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: JS / Lint
- GitHub Check: Client / Unit
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
394-401: Excellent dynamic backoff strategy selection.The conditional logic cleanly separates registration from login backoff strategies based on request introspection, avoiding the need for additional boolean parameters. This maintains backward compatibility while achieving the PR's objectives.
4799eef to
a005204
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
shared/management/client/grpc.go (1)
384-386: Clarify timeout log message to distinguish registration vs loginThe log text
"login timeout, retrying..."can be misleading whenlogin()is invoked for registration flows as well. Consider either:
- Using a more generic message like
"operation timeout, retrying...", or- Including the derived operation type, e.g.,
"registration timeout, retrying..."vs"login timeout, retrying...".This will make logs easier to interpret now that registration and login use different backoff profiles.
🧹 Nitpick comments (4)
shared/management/client/grpc.go (4)
28-38: Retry constants look good; consider keeping them internal if not part of the public APIThe registration/login retry constants match the intent (longer window for registration, shorter for login) and are clearly named. If they’re not meant to be part of the package’s external API, consider making them unexported (
registrationRetryTimeout, etc.) to avoid committing to them as public knobs.
325-349: Backoff helpers correctly wrap ExponentialBackOff; be aware MaxElapsedTime caps only the backoff windowThe
newRegistrationBackoff/newLoginBackoffhelpers correctly constructExponentialBackOffpolicies and wrap them withbackoff.WithContext, andbackoff.Retrywill callReset()before use, so the instances are safe to use as written. (pkg.go.dev)Note that
MaxElapsedTimelimits the total elapsed time within the backoff policy, while each attempt still runs under the separateConnectTimeoutused insideoperation. If you need a strict overall wall-clock cap (including RPC timeouts), you may want to tightenMaxElapsedTimeor derive the per-RPC context from a higher-level deadline.
379-390: Only DeadlineExceeded is retried; double‑check that’s the intended retry policyIn the Login RPC, only
codes.DeadlineExceededis treated as retryable;codes.Canceledand all other status codes are turned intobackoff.Permanent(err), so they will not be retried.If the server or network can transiently return other codes like
codes.Unavailableorcodes.ResourceExhausted, you may want to explicitly allowlist those as retryable too, while keeping clearly permanent codes (e.g.,InvalidArgument,Unauthenticated,PermissionDenied) as non-retryable.A possible pattern:
if s, ok := gstatus.FromError(err); ok { switch s.Code() { case codes.Canceled: // permanent return backoff.Permanent(err) case codes.DeadlineExceeded, codes.Unavailable, codes.ResourceExhausted: // retryable log.Debugf("%s timeout/unavailable, retrying...", opType) return err default: return backoff.Permanent(err) } }
395-403: Backoff selection based on isRegistrationRequest looks correct and is context‑awareChoosing between
newRegistrationBackoff(c.ctx)andnewLoginBackoff(c.ctx)based onisRegistrationRequest(req)aligns with the goal of using different retry windows for registration vs login and ties the retry loop to the client’s lifecycle viac.ctx.One minor consideration:
operationcurrently usescontext.Background()as the parent formgmCtx, so cancellation ofc.ctxwill stop future retries but not an in-flight RPC; that RPC will still run untilConnectTimeoutexpires. If you need faster shutdown, consider derivingmgmCtxfromc.ctxinstead ofcontext.Background().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Windows
- GitHub Check: Linux
- GitHub Check: Darwin
- GitHub Check: release_ui_darwin
- GitHub Check: release
- GitHub Check: release_ui
- GitHub Check: Client / Unit
- GitHub Check: JS / Lint
- GitHub Check: Build Cache
- GitHub Check: iOS / Build
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Client / Unit
🔇 Additional comments (1)
shared/management/client/grpc.go (1)
351-355: I'm unable to clone the repository to verify the review comment. The cloning operation is failing consistently, which prevents me from:
- Examining how
LoginRequestis constructed in different flows (registration vs. login)- Finding where
isRegistrationRequest()is called and what the actual backoff behavior implications are- Checking for related documentation, tests, or comments that clarify the design intent
- Verifying whether JWT token-based flows are genuinely always meant to be treated as "registration"
Given that I cannot access the codebase to verify the caller patterns and the actual backoff profile implications, I must mark this review comment as requiring manual verification.
The review comment raises valid architectural concerns about the semantics of the
isRegistrationRequest()condition, particularly whether treating allJwtToken-based requests as registration is intentional or could cause issues if future "true login with JwtToken only" flows are added. However, without codebase access, I cannot confirm whether these concerns are substantiated by actual usage patterns.
The previous implementation used a single 10 second backoff for both registration and login operations. However, these are fundamentally different operations with different timeout requirements: Registration (new peer creation) legitimately needs longer timeouts because it involves database writes, external IdP validation, and group membership calculations. Login (existing peer authentication) is faster and should fail quickly to provide actionable feedback to users. This change: - Auto-detects operation type by checking for SetupKey or JwtToken in the request - Uses 180s timeout for registration (5s initial, 30s max interval) - Uses 45s timeout for login (3s initial, 15s max interval) - Extracts backoff creation into helper functions to keep login() complexity low - Maintains backward compatibility - no API changes to public methods The auto-detection approach eliminates the need for boolean parameters and keeps the login() function simple while providing appropriate timeouts for each scenario.
a005204 to
9a1905a
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
shared/management/client/grpc.go (1)
379-388: Consider distinguishing operation type in log messages.The error handling logic is correct, but the log messages (lines 381, 385) always say "login" even when the operation might be a registration. Since these operations now have different timeout configurations, more specific messages could aid debugging.
Consider applying this diff:
+ isRegistration := isRegistrationRequest(req) + operationType := "login" + if isRegistration { + operationType = "registration" + } if err != nil { if s, ok := gstatus.FromError(err); ok { if s.Code() == codes.Canceled { - log.Debugf("login canceled") + log.Debugf("%s canceled", operationType) return backoff.Permanent(err) } if s.Code() == codes.DeadlineExceeded { - log.Debugf("login timeout, retrying...") + log.Debugf("%s timeout, retrying...", operationType) return err } }Note: This would require moving the
isRegistrationRequest(req)call earlier in the operation function.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
shared/management/client/grpc.go(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
shared/management/client/grpc.go (3)
client/proto/daemon.pb.go (3)
LoginRequest(364-422)LoginRequest(435-435)LoginRequest(450-452)shared/management/status/error.go (1)
FromError(78-87)util/retry.go (1)
Retry(10-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Client / Unit
- GitHub Check: release_ui_darwin
- GitHub Check: release_ui
- GitHub Check: release
- GitHub Check: Linux
- GitHub Check: Windows
- GitHub Check: Android / Build
- GitHub Check: Client / Unit
- GitHub Check: Build Cache
- GitHub Check: JS / Lint
🔇 Additional comments (5)
shared/management/client/grpc.go (5)
28-38: LGTM! Well-organized retry constants.The constants are logically grouped and the values are appropriate for their respective operations. Registration's longer timeout (180s) accommodates heavier operations like database writes and IdP validation, while login's shorter timeout (45s) provides faster feedback.
325-336: LGTM! Backoff factory correctly configured.The registration backoff factory properly integrates context cancellation and uses appropriate exponential backoff parameters for registration operations.
338-349: LGTM! Backoff factory correctly configured.The login backoff factory properly integrates context cancellation and uses appropriate exponential backoff parameters for login operations.
351-355: LGTM! Logic correctly identifies registration requests.The OR logic properly distinguishes registration (which uses SetupKey and/or JwtToken) from login (which uses neither). The function comment accurately describes the behavior.
395-402: LGTM! Backoff strategy selection correctly implemented.The conditional logic properly selects registration or login backoff based on the request type. The implementation is clear and maintains the single responsibility of the login method.



The previous implementation used a single 10 second backoff for both registration and login operations, default GRPC timeout. However, these are fundamentally different operations with different timeout requirements:
Registration (new peer creation) legitimately needs longer timeouts because it involves database writes, external IdP validation, and group membership calculations. Login (existing peer authentication) is faster and should fail quickly to provide actionable feedback to users.
Describe your changes
This change:
The auto-detection approach eliminates the need for boolean parameters and keeps the login() function simple while providing appropriate timeouts for each scenario.
Issue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.