Configurable RAG Response Format and/or Structured Output for knowledge_search tool

### 🚀 Describe the new functionality needed

Currently, the `knowledge_search` tool in llama-stack produces a text-based response format with:

- **Header (hardcoded):** `"knowledge_search tool found N chunks:\nBEGIN of knowledge_search tool results.\n"`
- **Chunks (configurable via `RAGQueryConfig.chunk_template`):** Default: `"Result {index}\nContent: {chunk.content}\nMetadata: {metadata}\n"`
- **Footer (hardcoded):** `"END of knowledge_search tool results.\n"`

Source: `llama_stack/providers/inline/tool_runtime/rag/memory.py`

**Proposed improvements (in order of preference):**

1. **Best:** Provide structured output (JSON) as an option, returning parsed chunk objects with content and metadata directly accessible without text parsing. Ideally, also expose the relevance/similarity score from the vector search, which is currently not included in the response.

2. **Good:** Allow full client-side control of the response format, including:
   - Custom header template (or ability to omit)
   - Custom footer template (or ability to omit)
   - Custom chunk template (already supported)

### 💡 Why is this needed? What if we don't build it?

**Why it's needed:**

- Clients currently must implement brittle regex-based parsing to extract individual RAG chunks from the text response
- The hardcoded header/footer and default chunk format create tight coupling between llama-stack's internal implementation and client-side parsing logic
- Any future changes to the format (even whitespace changes) can silently break client parsers

**Benefits of structured output:**

- Clean API contract with explicit fields
- No text parsing required on client side
- Type-safe access to chunk content and metadata
- Opportunity to expose additional data (e.g., relevance scores) that are currently discarded
- Easier to maintain compatibility across versions

### Other thoughts

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configurable RAG Response Format and/or Structured Output for knowledge_search tool #4262

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Configurable RAG Response Format and/or Structured Output for knowledge_search tool #4262

Description

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions