-
Notifications
You must be signed in to change notification settings - Fork 663
[Feature] support stop_token_ids #5399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5399 +/- ##
==========================================
Coverage ? 59.52%
==========================================
Files ? 327
Lines ? 40641
Branches ? 6169
==========================================
Hits ? 24193
Misses ? 14584
Partials ? 1864
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| const int accept_tokens_len, | ||
| const int stop_seqs_bs, | ||
| const int stop_seqs_max_len, | ||
| const int64_t *min_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
两个算子改参数的话,记得把 ernie5_serving 同步改了
…into support_token_ids_3
| stop_token_ids_final = [] | ||
| if request.get("stop_token_ids") is not None: | ||
| stop_token_ids = request.get("stop_token_ids") | ||
| if isinstance(stop_token_ids, list) and len(stop_token_ids) > 0: | ||
| if isinstance(stop_token_ids[0], int): | ||
| stop_token_ids_final.extend([[t] for t in stop_token_ids]) | ||
| elif isinstance(stop_token_ids[0], list): | ||
| stop_token_ids_final.extend(stop_token_ids) | ||
|
|
||
| stop_sequences = request.get("stop", []) | ||
| if stop_sequences: | ||
| stop_seqs, stop_seqs_len = self.update_stop_seq(stop_sequences) | ||
| request["stop_token_ids"] = stop_seqs | ||
| stop_token_ids_final.extend(stop_seqs) | ||
|
|
||
| if stop_token_ids_final: | ||
| stop_seqs_len = [len(seq) for seq in stop_token_ids_final] | ||
| request["stop_token_ids"] = stop_token_ids_final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
很多个文件都有这坨,看看能不能抽象封装一下
| const paddle::Tensor &step_idx, | ||
| const paddle::Tensor &stop_seqs, | ||
| const paddle::Tensor &stop_seqs_len, | ||
| const paddle::Tensor &min_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
引入min_tokens的作用是什么?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
设置最小生成的token数量,如果当前生成的token数量小于min_tokens,即使设置了stop_token_ids,也不会停止
| print("model_output.min_tokens", model_output.min_tokens) | ||
| print("stop_token_ids", model_output.stop_token_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
Motivation
支持stop_token_ids
Modifications
Usage or Command
online serving
stop_token_idsparameter, it can be '[List[int]]'offline demo
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.