[llvm] [SDAG] Fix deferring constrained function calls (PR #153029)

Mon Oct 6 12:03:27 PDT 2025

spavloff wrote:

I still cannot understand what is wrong with the existing implementation, or perhaps we are referring to different things. So let me consider the following example. Excuse me for trivial explanations, I just want to keep the necessary logic.

```
target triple = "x86_64-unknown-linux-gnu"
define float @test(float %a, float %b, float %c, float %d) strictfp {
entry:
  %mul1 = call float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
  %mul2 = call float @llvm.experimental.constrained.fmul.f32(float %c, float %d, metadata !"round.dynamic", metadata !"fpexcept.strict")
  %call = call i32 @fesetround(i32 3072)
  %add = call float @llvm.experimental.constrained.fadd.f32(float %mul1, float %mul2, metadata !"round.dynamic", metadata !"fpexcept.strict")
  ret float %add
}
declare i32 @fesetround(i32)
```

In this code `fadd` and `fmul` instructions are executed with different rounding mode, due to the call of `fesetround`.

In this example all constrained function calls have attribute `memory(inaccessiblemem: readwrite)`, because this is a property of corresponding intrinsics. As for the call to `fesetround`, the compiler now knows nothing about this function. Thus, it does not get any specific attributes that influence ordering; clang assigns only `{ nounwind strictfp }` to the call. When the call is queried about its memory effects, the function `Instruction::mayWriteToMemory` returns true, `CallBase::getMemoryEffects()` returns `MemoryEffects::unknown()`, which means possible access to any memory. So all calls in the code above have read-write access to inaccessible memory and their relative order must be preserved. It means neither call can change its position relative to the call to `fesetround`.

In DAG ordering is determined by chains. The code above produces DAG like below:

```
  t0: ch,glue = EntryToken
  …
  t9: f32,ch = strict_fmul t0, t2, t4
  …
  t10: f32,ch = strict_fmul t0, t6, t8
      t13: ch = TokenFactor t9:1, t10:1
    t15: ch,glue = callseq_start t13, TargetConstant:i64<0>, TargetConstant:i64<0>
  …
  t21: ch,glue = callseq_end t20, TargetConstant:i64<0>, TargetConstant:i64<0>, t20:1
    t23: i32,ch,glue = CopyFromReg t21, Register:i32 $eax, t21:1
  t24: f32,ch = strict_fadd t23:1, t9, t10
```

Ordering here is a bit more complex, due to the code that this patch modifies - both `strict_mul` may be executed in parallel. Otherwise the sequence of calls is the same as in IR. Strict FP operations cannot change their order relative to the call of `fesetround`, this ensures that the FP operations are evaluates with correct values of FP control register.

In MIR the initial code becomes:

```
  %4:fr32 = MULSSrr %0:fr32(tied-def 0), %1:fr32, implicit $mxcsr
  %5:fr32 = MULSSrr %2:fr32(tied-def 0), %3:fr32, implicit $mxcsr
…
  CALL64pcrel32 target-flags(x86-plt) @fesetround, …, implicit-def $fpcw, implicit-def $mxcsr
  …
  %8:fr32 = ADDSSrr %4:fr32(tied-def 0), killed %5:fr32, implicit $mxcsr
```

FP operations have implicit use of FP control register `implicit $mxcsr`, which represent their dependency on FP environment. The call of `fesetround` has implicit definition of the same register `implicit-def $mxcsr`. Register uses cannot be moved across the register definition, this ensures that FP instructions preserve their position relative to the call of `fesetround`.

The picture is not changed if `fesetround` is replaced with any other external function. It also remains if intrinsic call is used which changes FP environment, like `llvm.set_rounding`.

As we see, the current implementation provides guarantees for proper ordering of FP operations at each level. Could you please point me out, what is missed in the current implementation or in what case the picture above becomes invalid?

https://github.com/llvm/llvm-project/pull/153029