[Bug]: Bad output of GPT-OSS in AutoDeploy

### System Info

Tested on H100

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

There's accuracy issue for  `unsloth/gpt-oss-20b-BF16`  and  `openai/gpt-oss-20b` across different configurations.
The examples configs and outputs ranking from best output to worst output:
1.  using torch attention backend and torch-simple compile backend + greedy decoding: **Repetitive, not too bad**
```
args:
  mode: graph
  world_size: 1
  runtime: demollm
  compile_backend: torch-simple
  attn_page_size: 64
  attn_backend: torch
  model_factory: AutoModelForCausalLM
  skip_loading_weights: false
  disable_overlap_scheduler: true
  kv_cache_config:
    enable_block_reuse: false
  model_kwargs:
    torch_dtype: bfloat16
benchmark:
  enabled: false
prompt:
  sp_kwargs:
    top_k: 0
    temperature: 0
dry_run: false
```
Output is:
```
[12/02/2025-13:05:16] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 0] How big is the universe? : 1.5 trillion light years? 1.5 trillion? Wait: The observable universe radius is about 46.5 billion light years. But the entire universe might be infinite. But the question: "How big is the universe? 1.5 trillion light years? 1.5 trillion? 1.5 trillion? 1.5 trillion? 1.5 trillion? 1.5 trillion? 1.5 trillion? 1.5 trillion? 1
[12/02/2025-13:05:16] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 1] In simple words and a single sentence, explain the concept of gravity: : 1) The 2nd law of Newton's law?

The second law of Newton's law states that the force acting on an object is equal to the mass of the object multiplied by its acceleration.

The second law of Newton's

The second law of Newton states that the force acting on an object is equal to the mass of the object multiplied by its acceleration.

The second law of Newton states that the force acting on an object is equal to the mass of the object multiplied by its
```
2. using torch attention backend and torch-simple compile backend + default sampling kwarg: **Much worse, not totally random**

```
[12/02/2025-13:06:43] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 0] How big is the universe? : 20km means? 0.99%'

They might want to know about the mass, but likely ask for visual diameter measured in kilometers.

Hence final answer: It's impossible to measure.

But we can compute size if mass constant.

Let's deliver finalcomend.

Alternatively, we can compute using baryon number vs mass.

Also compute scale as 'R ~ 13Mpc/h'.

Ok.

Let's craft answer accordingly.

Focus on measurement and mention that the radius is ~ $10^{23} m
[12/02/2025-13:06:43] [TRT-LLM AUTO-DEPLOY] [I] [PROMPT 1] In simple words and a single sentence, explain the concept of gravity: :  It's the natural force that draws efficientlyDesign and produce; let's say planning in **OKM hotcountdown portion & answer plugin extent thou?**

It appears there might have been some typos or errors in your query. Let'sossi clarify or reinterpret what you're asking about.

It looks like your message might be a bit mixed up. Could you clarify what you need? Are you talking about a countdown, OKM (which could mean "Other Closed Markets" or something else?), planning,
```
3. Output for attention backend and compile backend other than torch is totally random.

The results above are tested with `unsloth/gpt-oss-20b-BF16`, `openai/gpt-oss-20b` (using `triton_mxfp4_moe` has similar behaviour). I suggest 
we look into the accuracy issue for the BF16 model first.

The error maybe from different sources. Culprits could be the harmony chat template, attention sink, etc.

### Expected behavior

Coherent output. We can also run this model with benchmarks like MMLU.

### actual behavior

Outputs are weird across different configs.

### additional notes

Likely a accuracy regression that happens 2 months ago.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Bad output of GPT-OSS in AutoDeploy #9810

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Bad output of GPT-OSS in AutoDeploy #9810

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions