[llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Wed Jan 5 06:44:23 PST 2022

Did you try `hasSideEffects = 1`?
I’m not familiar with AArch64. On X86, we have separate FPCR and FPSR. The former is used for control (rounding, exception mask) and the latter is for status. We modeled all FP instructions that may raise exception by `mayRaiseFPException = 1` and using FPCR. Note, the read of FPCR instruction is another use instead of def FPCR. So it’s not necessary to keep the order of read instruction ahead as source order. Only the write FPCR does. I guess it is the same reason for AArch64? Maybe you can have a check on the write of FPCR.

Thanks
Phoebe

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Cyril Six via llvm-dev
Sent: Wednesday, January 5, 2022 7:44 PM
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Implicit Defs and Uses are ignored by pre-RA schedulers

Hello,

In our Kalray LLVM backend, we have builtins to get and set system registers. One of them is $CS, which has sticky bits enforcing rounding mode or storing masked floating-point exceptions. The equivalent on AArch64 would be FPCR.

In our user code, we would like to preserve the partial ordering between a SET to $CS and a floating-point operation, since the SET to $CS might be modifying the rounding mode. Similarly, we would like to preserve the partial ordering between a GET from $CS and a floating-point operation, since a user code might want to examine the floating-point exception bits right after a given floating-point operation.

Another use-case we have is the following: we have a coprocessor that is turned on by setting a given bit on a system register. This can be accessed by a builtin. Such SET instruction must happen before using a coprocessor instruction - the compiler should not break that dependency when reordering instructions.

We have tried to implement this by using implicit Defs and implicit Uses in our instruction definitions, using for example `Defs = [CS] in` and `Uses = [CS]` where relevant in our Target Description files.

I have been running some experiments, examining the scheduling outputs and the dependencies (using VLIWScheduler in pre-RA, PostRASchedulerList in post-RA, and a child of VLIWPacketizerList for bundling).

I have found that the implicit defs and uses are indeed taken into account by the post-RA schedulers. However, they seem to be ignored by the pre-RA schedulers. Also, they do not appear as dependencies in the SelectionDAG.

If I look at what some other backends did, AArch64 does not seem to model anything on FPCR. PowerPC sets MFFS as scheduling barrier (isSchedulingBoundary) to prevent floating-point instructions being ordered above it - but isSchedulingBoundary seems to be only used by post-RA schedulers; pre-RA schedulers do not seem to care about that.

The bad consequence for us: our programmers have to encapsulate the SET instructions (touching system registers) in non-inlined functions to enforce the compiler not breaking anything.

We are looking for advice on how to treat this problem - we have possible leads, like modifying the SelectionDAG to recover these dependencies, or modifying the schedulers to scan the SelectionDAG and enforce the source order when such dependency is detected (maybe by having a look at how SourceScheduler works), but we have not yet investigated it fully.

Any such advice would be greatly appreciated

Also, another related issue: it would seem that the flag -ffp-exception-behavior=strict does not preserve the exception semantics like it says it does. Although the generated IR seems to preserve it, there does not seem to be anything in the LLVM backends enforcing the "strict" floating-point exception behavior.

That last point can be witnessed in that piece of code: https://godbolt.org/z/e96zP7jET

```
long fpcr;

int toto(float a, float b, float c, double d, double e){
  float bc = b + c; // first faddd
  asm("mrs %[result], FPCR" : [result] "=r" (fpcr) : :);
  float abc = a + bc; // second faddd
  float dw = (float) d; // fwidenlwd : should not happen before the second faddd
  float ew = (float) e;
  int dw_ewl = (int) dw + (int) ew;
  int abcl_dw_ewl = (int) abc + dw_ewl;
  return abcl_dw_ewl;
}

```

Compiling this piece of code with clang 11.0.0 for ARMv8-a gives the following assembly code:
```
toto:
        fadd    s1, s1, s2
        fcvt    s2, d3
        fadd    s0, s1, s0
        fcvt    s3, d4
        fcvtzs  w9, s2
        fcvtzs  w10, s0
        add     w9, w10, w9
        fcvtzs  w10, s3
        add     w0, w9, w10
        adrp    x9, fpcr
        //APP
        mrs     x8, FPCR
        //NO_APP
        str     x8, [x9, :lo12:fpcr]
        ret
```

Notice that mrs was moved below - which does not seem to preserve the floating-point exception semantics of the compiled code.

PS : apologies for the double message if any ; I sent the first to llvm-dev-bounces by mistake

Best regards,

Cyril Six
Compiler Engineer • Kalray
Phone:
csix at kalrayinc.com<mailto:csix at kalrayinc.com> • www.kalrayinc.com<https://www.kalrayinc.com>

[Kalray logo]<https://www.kalrayinc.com/>

Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20220105/065d6a6f/attachment.html>