-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
We observe crashes (double-free, heap corruption) when multiple threads concurrently call execute() on the same compiled_partition instance. The crash occurs inside matmul_t::execute_impl during scratchpad deallocation.
oneDNN version: 3.6.0
Graph contains: MatMul operations
OS: Linux 6.8.0-060800rc6-generic
Can you clarify:
- Is compiled_partition::execute() guaranteed thread-safe for concurrent calls on the same instance?
- If not, should we maintain a pool of compiled_partition instances (one per concurrent execution)?
- Is there an allocator or scratchpad configuration that enables safe concurrent execution? We already tried
dnnl::graph::make_engine_with_allocator(dnnl::engine::kind::cpu, 0, allocator);with functions to system malloc/free but without effect.
// Setup
dnnl::graph::graph g(dnnl::engine::kind::cpu);
// ... add matmul operations ...
g.finalize();
auto partitions = g.get_partitions();
auto cp = partitions[0].compile(inputs, outputs, engine);
// Concurrent execution - CRASHES
std::vector<std::thread> threads;
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&]() {
for (int iter = 0; iter < 100; ++iter) {
// Each thread has its own input/output tensors
std::vector<dnnl::graph::tensor> my_inputs = /* thread-local */;
std::vector<dnnl::graph::tensor> my_outputs = /* thread-local */;
cp.execute(stream, my_inputs, my_outputs); // CRASH here
}
});
}
for (auto& t : threads) t.join();
Thank you in advance.