-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
🚀 Describe the new functionality needed
Currently, the knowledge_search tool in llama-stack produces a text-based response format with:
- Header (hardcoded):
"knowledge_search tool found N chunks:\nBEGIN of knowledge_search tool results.\n" - Chunks (configurable via
RAGQueryConfig.chunk_template): Default:"Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n" - Footer (hardcoded):
"END of knowledge_search tool results.\n"
Source: llama_stack/providers/inline/tool_runtime/rag/memory.py
Proposed improvements (in order of preference):
-
Best: Provide structured output (JSON) as an option, returning parsed chunk objects with content and metadata directly accessible without text parsing. Ideally, also expose the relevance/similarity score from the vector search, which is currently not included in the response.
-
Good: Allow full client-side control of the response format, including:
- Custom header template (or ability to omit)
- Custom footer template (or ability to omit)
- Custom chunk template (already supported)
💡 Why is this needed? What if we don't build it?
Why it's needed:
- Clients currently must implement brittle regex-based parsing to extract individual RAG chunks from the text response
- The hardcoded header/footer and default chunk format create tight coupling between llama-stack's internal implementation and client-side parsing logic
- Any future changes to the format (even whitespace changes) can silently break client parsers
Benefits of structured output:
- Clean API contract with explicit fields
- No text parsing required on client side
- Type-safe access to chunk content and metadata
- Opportunity to expose additional data (e.g., relevance scores) that are currently discarded
- Easier to maintain compatibility across versions
Other thoughts
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request