WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@juncaipeng
Copy link
Collaborator

@juncaipeng juncaipeng commented Dec 8, 2025

Motivation

Decode does not cache requests for preallocating resource in default

Modifications

env.FD_ENABLE_CACHE_TASK = 0

Usage or Command

None

Accuracy Tests

None

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings December 8, 2025 07:05
@paddle-bot
Copy link

paddle-bot bot commented Dec 8, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the default value of the FD_ENABLE_CACHE_TASK environment variable from "1" (enabled) to "0" (disabled), which alters the default behavior of request caching for resource preallocation in the decode node. The change also updates the associated comment to better describe the setting's purpose.

Key Changes:

  • Default value of FD_ENABLE_CACHE_TASK changed from "1" to "0"
  • Comment updated to clarify that this controls whether decode caches requests for preallocating resources

"FD_ZMQ_CONTROL_CMD_SERVER_PORTS": lambda: os.getenv("FD_ZMQ_CONTROL_CMD_SERVER_PORTS", "8202"),
# Whether to enable cache task in decode node
"FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
# Whether to enable decode caches requests for preallocating resource
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: The comment should read "Whether to enable decode to cache requests" or "Whether decode caches requests". The current phrasing "enable decode caches requests" is grammatically incorrect.

Suggested change
# Whether to enable decode caches requests for preallocating resource
# Whether to enable decode to cache requests for preallocating resources

Copilot uses AI. Check for mistakes.
# Whether to enable cache task in decode node
"FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
# Whether to enable decode caches requests for preallocating resource
"FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"),
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default value from "1" (enabled) to "0" (disabled) is a breaking change that alters the default behavior of the decode node's request caching for resource preallocation. This could significantly impact existing deployments that rely on the default behavior.

According to the custom guidelines, the PR description should clearly explain why this modification is being made and what problem is being solved. While the motivation states "Decode does not cache requests for preallocating resource in default," it doesn't explain:

  1. Why is the previous default behavior (caching enabled) problematic?
  2. What issue does disabling it by default solve?
  3. What is the impact on existing users?
  4. Are there any migration steps for users who need the caching behavior?

Please expand the PR description to include this information so that users understand the rationale and impact of this breaking change.

Copilot generated this review using guidance from repository custom instructions.
# Whether to enable cache task in decode node
"FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
# Whether to enable decode caches requests for preallocating resource
"FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"),
Copy link

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking API change. The default behavior is being inverted from enabled (1) to disabled (0). This change affects how the decode node handles request caching for resource preallocation across the entire system.

When enable_decode_cache_task is False (the new default), the code at line 1184-1188 in common_engine.py will immediately send an error back to the prefill node when resources are insufficient, rather than caching and waiting. This fundamentally changes the behavior of the PD (Prefill-Decode) disaggregation feature.

Consider:

  1. Adding deprecation warnings in the current release before changing the default
  2. Providing clear migration documentation
  3. Adding a note in the CHANGELOG about this breaking change
  4. Ensuring users can easily opt back into the previous behavior by setting FD_ENABLE_CACHE_TASK=1

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@80efe98). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5427   +/-   ##
==========================================
  Coverage           ?   59.13%           
==========================================
  Files              ?      327           
  Lines              ?    40646           
  Branches           ?     6171           
==========================================
  Hits               ?    24037           
  Misses             ?    14765           
  Partials           ?     1844           
Flag Coverage Δ
GPU 59.13% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juncaipeng juncaipeng closed this Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants