WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Configurable RAG Response Format and/or Structured Output for knowledge_search tool #4262

@onmete

Description

@onmete

🚀 Describe the new functionality needed

Currently, the knowledge_search tool in llama-stack produces a text-based response format with:

  • Header (hardcoded): "knowledge_search tool found N chunks:\nBEGIN of knowledge_search tool results.\n"
  • Chunks (configurable via RAGQueryConfig.chunk_template): Default: "Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n"
  • Footer (hardcoded): "END of knowledge_search tool results.\n"

Source: llama_stack/providers/inline/tool_runtime/rag/memory.py

Proposed improvements (in order of preference):

  1. Best: Provide structured output (JSON) as an option, returning parsed chunk objects with content and metadata directly accessible without text parsing. Ideally, also expose the relevance/similarity score from the vector search, which is currently not included in the response.

  2. Good: Allow full client-side control of the response format, including:

    • Custom header template (or ability to omit)
    • Custom footer template (or ability to omit)
    • Custom chunk template (already supported)

💡 Why is this needed? What if we don't build it?

Why it's needed:

  • Clients currently must implement brittle regex-based parsing to extract individual RAG chunks from the text response
  • The hardcoded header/footer and default chunk format create tight coupling between llama-stack's internal implementation and client-side parsing logic
  • Any future changes to the format (even whitespace changes) can silently break client parsers

Benefits of structured output:

  • Clean API contract with explicit fields
  • No text parsing required on client side
  • Type-safe access to chunk content and metadata
  • Opportunity to expose additional data (e.g., relevance scores) that are currently discarded
  • Easier to maintain compatibility across versions

Other thoughts

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions