WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

azure-ai-evaluation EvaluatorConfig column mapping does not filter inputs #44287

@bgwilkin

Description

@bgwilkin

Hello, I'm currently experiencing the following issue with evaluators & their configuration in the azure-ai-evaluation python package.

Summary
When using the evaluate() function with evaluators that support both conversation-based and individual inputs (e.g., FluencyEvaluator, RelevanceEvaluator), if the target function returns both conversation AND individual inputs like query/response, the evaluators fail with the error:

Cannot provide both 'conversation' and individual inputs at the same time.

This happens even when the evaluator_config column mapping explicitly specifies only one input type (e.g., only conversation).

Expected Behavior
The evaluator_config column mapping should filter which inputs are passed to evaluators, not just rename them. If I configure an evaluator to only use conversation, then query/response should not be passed to that evaluator.

Actual Behavior
The column mapping only renames input columns. All inputs from the target/data are still passed to evaluators. When an evaluator's _convert_kwargs_to_eval_input method receives both conversation and singleton inputs, it raises an exception.

Root Cause Analysis

The issue stems from the interaction of several components:

  1. BatchEngine._apply_column_mapping_to_lines() only maps specified columns but does not remove unmapped columns from the input dictionary.

  2. BatchEngine.__preprocess_inputs() checks if the evaluator function has **kwargs. Since all built-in evaluators inherit from EvaluatorBase with a __call__(self, *args, **kwargs) signature, has_kwargs is always True, causing ALL inputs to be passed through:

    if has_kwargs:
        return inputs  # All inputs passed, nothing filtered
  3. EvaluatorBase._convert_kwargs_to_eval_input() explicitly checks for and rejects having both conversation and singleton inputs:

    if conversation is not None and any(singletons.values()):
        raise EvaluationException(
            message=f"{type(self).__name__}: Cannot provide both 'conversation' and individual inputs..."
        )

Reproduction Steps

from azure.ai.evaluation import evaluate, FluencyEvaluator

# Target that returns both conversation and query/response
def my_target(query: str) -> dict:
    response = f"Response to: {query}"
    return {
        "response": response,
        "conversation": {
            "messages": [
                {"role": "user", "content": query},
                {"role": "assistant", "content": response}
            ]
        }
    }

# Even though we only map 'conversation', the evaluator still receives 'response'
result = evaluate(
    data="test_data.jsonl",  # Contains {"query": "test question"}
    target=my_target,
    evaluators={
        "fluency": FluencyEvaluator(model_config)
    },
    evaluator_config={
        "fluency": {
            "conversation": "${target.conversation}"  # Only want conversation
        }
    }
)
# Raises: FluencyEvaluator: Cannot provide both 'conversation' and individual inputs at the same time.

Current Workaround

The two workarounds are running conversation-based and individual input evaluators in separate runs with discrete targets, or wrap evaluators to filter kwargs before passing them to the underlying evaluator:

class ConversationMetricWrapper:
    def __init__(self, evaluator, use_conversation=True):
        self.evaluator = evaluator
        self.use_conversation = use_conversation
    
    def _filter_kwargs(self, kwargs):
        if self.use_conversation:
            return {"conversation": kwargs.get("conversation")}
        else:
            filtered = kwargs.copy()
            filtered.pop("conversation", None)
            return filtered
    
    def __call__(self, *args, **kwargs):
        return self.evaluator(*args, **self._filter_kwargs(kwargs))

This workaround also requires additional complexity to maintain isinstance() checks for Azure AI Foundry portal integration, which expects built-in evaluator class types for proper metric display and tracking.

Suggested Fix

One or more of the following changes could address this:

  1. Option A: Filter in _apply_column_mapping_to_lines() - Only include columns that are explicitly mapped in the output dictionary. Unmapped columns from the source data should not be passed to evaluators.

  2. Option B: Filter in __preprocess_inputs() - When evaluators have **kwargs, inspect the actual parameter annotations/overloads to determine which inputs are valid, rather than passing everything.

  3. Option C: Make _convert_kwargs_to_eval_input() more lenient - Instead of raising an error when both conversation and singletons are provided, prioritize one based on configuration or a class-level setting.

  4. Option D: Add a filter_inputs option to evaluator config - Allow users to explicitly specify which inputs should be filtered out:

    evaluator_config={
        "fluency": {
            "conversation": "${target.conversation}",
            "_filter_inputs": ["query", "response"]  # Explicitly filter these out
        }
    }

Environment

  • azure-ai-evaluation version: 1.13.7
  • Python version: 3.12
  • OS: macOS

Impact

This issue prevents the usage of conversation-based and individual input-based evaluators in the same run without custom wrapper classes. The workaround also requires dynamic class creation to maintain Azure AI Foundry portal integration, as wrapped evaluators would otherwise fail isinstance() checks used for metric tracking.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EvaluationIssues related to the client library for Azure AI EvaluationService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.feature-requestThis issue requires a new behavior in the product in order be resolved.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions