[Mlir-commits] [mlir] [mlir][Transforms] Dialect Conversion: Fix folder implementation (PR #150775)

Tom Natan llvmlistbot at llvm.org
Fri Aug 8 01:25:25 PDT 2025


tomnatan30 wrote:

Hey @matthias-springer!

It seems that this PR has caused an tsan failure related to Shardy verification.

The issue seems to be a race condition originating in verifyOnExit where it verifies ops in parallel: parallelForEach(op.getContext(), opsWithIsolatedRegions, [&](Operation *o){...});. The ops it runs over are all ops that are marked w/ the IsIsolatedFromAbove trait. For a given set of ops, we sometimes have the race:

Read: `ManualComputationOp::verify()` calls `getOperation()->getParentOfType<ModuleOp>()` to construct a SymbolTable.
Write: `ChloLegalizeToHighLevelMhloPass` attempts to legalize during dialect conversion, and one of the attempts in legalizeWithFold fails. What the PR did was change this to be more hermetic by undo-ing the side effects of the failed fold. One of the things is does is set the location.

For some context, Shardy uses symbols to define top level [MeshOps](https://github.com/openxla/shardy/blob/main/shardy/dialect/sdy/ir/ops.td#L34), then attributes (either discardable or properties of SDY ops) reference the mesh in [TensorShardingAttr](https://github.com/openxla/shardy/blob/main/shardy/dialect/sdy/ir/attrs.td#L580). The SDY op that has `IsIsolatedFromAbove` trait and references the mesh symbols is [ManualComputationOp](https://github.com/openxla/shardy/blob/main/shardy/dialect/sdy/ir/ops.td#L125). In our [verifiers](https://github.com/openxla/shardy/blob/main/shardy/dialect/sdy/ir/verifiers.cc#L933) we need access to the `ModuleOp` to get a symbol table for finding the mesh symbols.

IIUC, we aren't violating the constraints of `IsIsolatedFromAbove`, since it only refers to values used in the regions of the op, and we simply reference symbols in attributes. However, is it not allowed to access a parent op (ModuleOp) in the verification of a nested op?

A potential solution is to mark our ops as `SymbolUserOpInterface` and use verifySymbolUses() instead of verify() when we need the symbol table. We're happy to make that change, but I first want to better understand the issue and whether this is the right solution.

https://github.com/llvm/llvm-project/pull/150775


More information about the Mlir-commits mailing list