Replies: 1 comment 5 replies
-
|
Some random notes:
|
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Right now, AICI assumes a stateful interface with LLM inference engine, where new sequences are created (forked) and the KV cache is manipulated by backtracking and fast-forwarding. As noted by @AaronFriel the Automatic Prefix Caching in vLLM (probably coming to other engines as well) might simplify this.
Starting discussion thread for comments.
cc @emrekiciman @simon-mo
Beta Was this translation helpful? Give feedback.
All reactions