WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

brgemm_matmul segfaults with multiple threads when broadcasting dims #4396

@Sqvid

Description

@Sqvid

Summary

brgemm_matmul segfaults with multiple threads when broadcasting dims.

Version

main: 976bf2d

Environment

oneDNN includes hardware-specific optimizations and may behave
differently on depending on the compiler and build environment. Include
the following information to help reproduce the issue:

  • CPU: x64 and AArch64
  • OS version: Linux 6.14
  • git hash: 976bf2d

Steps to reproduce

On x64:

$ ONEDNN_VERBOSE=profile_create,profile_exec OMP_NUM_THREADS=2 ./build/tests/benchdnn/benchdnn --matmul --mode=R --stag=abcd --dtag=abcd 2x1x40x20:1x1x20x40
onednn_verbose,v1,info,oneDNN v3.11.0 (commit 976bf2d4eb61582c1655e69208ff8173a93d8b45)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:2
onednn_verbose,v1,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,primitive,create:cache_miss,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.2771
onednn_verbose,v1,primitive,create:cache_hit,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.00195312
onednn_verbose,v1,primitive,exec,cpu,matmul,brg_matmul:avx512_core,undef,src:f32::blocked:abcd::f0 wei:f32:a:blocked:abcd::f0 dst:f32::blocked:abcd::f0,,,2x1x40x20:1x1x20x40,0.194824
0:EXECUTED (1 ms) __REPRO: --mode=R --mode-modifier=M --matmul --stag=abcd --dtag=abcd 2x1x40x20:1x1x20x40
============================================================
= Implementation statistics (--summary=no-impl to disable) =
============================================================
| brg_matmul:avx512_core : 1 (100%)                        |
============================================================
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.00s; create_pd: 0.00s (30%); create_prim: 0.00s (38%); fill: 0.00s (0%); execute: 0.00s (14%);
Segmentation fault (core dumped)

Observed behavior

Segmentation fault.

Expected behavior

I would strongly prefer if it did not segfault.

Triage

This bug is common to the x64 and AArch64 paths. I have done the triage on the AArch64 end but I suspect it is the same bug.

Essentially we calculate the batch address here

const auto addr_batch = brgmm_ctx.get_batch_elem_ptr(ithr);

And that calculation depends on the thread number

brgemm_batch_element_t *get_batch_elem_ptr(int ithr) const {
return batch_element_ptr_
+ ithr * bgmmc_.brgemm_batch_element_per_thr_sz;
}

Which means that when broadcasting the following points to garbage:

And therefore segfaults when it is later accessed in the kernel (at execute time):

if (offset < (1 << 6)) {
ld1rw(z1.s, P_ALL_ONE / T_z,
ptr(reg_aux_A, (int32_t)offset));
} else {
add_imm(X_DEFAULT_ADDR, reg_aux_A, offset, X_TMP_0);
ld1rw(z1.s, P_ALL_ONE / T_z, ptr(X_DEFAULT_ADDR));

On the AArch64-path the acl_matmul implementation picks up this shape and therefore does not crash but the bug is still present. x64 crashes out-of-the-box.

I'd greatly appreciate advice on how to approach the fix (probably just adding a broadcast branch to the batch pointer calculation?).

@dzarukin @vpirogov

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA confirmed library bugplatform:cpu-aarch64Codeowner: @oneapi-src/onednn-cpu-aarch64platform:cpu-x64Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions