WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

RULER scoring + training tightly coupled to Litellm/OpenAI, cannot cleanly use ChatOllama/ChatNVIDIA as judge/inference models #475

@ansh-info

Description

@ansh-info

Description

In my setup, I want to:

  • Use a local Ollama server (and potentially NVIDIA’s API in the future) as:

    • The main agent model (for rollouts).
    • The judge model (for RULER scoring).

However, the current ART stack makes this very difficult because:

  • RULER scoring (ruler_score_group and related helpers) rely on Litellm in a way that expects OpenAI-style models.
  • init_chat_model also wraps everything in a ChatOpenAI instance (see separate issue).
  • This means I cannot simply pass ChatOllama or ChatNVIDIA (LangChain chat models) as the inference/judge model for training.

Practically:

  • If I try to step away from OpenAI and use:

    • Local Ollama for inference
    • Non-OpenAI providers as judges
  • I run into incompatibilities where:

    • RULER expects Litellm’s OpenAI-style model identifiers and behavior.
    • ART’s helpers are “too bound” to OpenAI semantics.

What I’d like

  • A more provider-agnostic design for:

    • RULER scoring
    • Training
    • init_chat_model
  • The ability to cleanly use:

    • ChatOllama (LangChain)
    • ChatNVIDIA
    • or other LangChain BaseChatModel implementations
  • Without having to hack around Litellm / OpenAI assumptions.

Why this matters

  • ART is otherwise a great framework for agent RL.

  • Many users want to move to:

    • Local models (Ollama)
    • Different clouds (NVIDIA, etc.)
  • Tight coupling to OpenAI via Litellm in the RULER path makes this significantly harder.

Request

  • Please consider:

    • Abstracting RULER to accept any LangChain-compatible ChatModel for structured scoring.
    • Or providing a documented way to plug in non-OpenAI judgment models (e.g. a “judge_fn” that uses arbitrary models).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions