[PATCH] D94163: [RISCV] Set dependency on floating point CSRs, 1/3

Wed Feb 3 22:35:04 PST 2021

sepavloff added a comment.

In D94163#2535653 <https://reviews.llvm.org/D94163#2535653>, @craig.topper wrote:

> In D94163#2535646 <https://reviews.llvm.org/D94163#2535646>, @sepavloff wrote:
>
>> In D94163#2489101 <https://reviews.llvm.org/D94163#2489101>, @craig.topper wrote:
>>
>>> In D94163#2489020 <https://reviews.llvm.org/D94163#2489020>, @sepavloff wrote:
>>>
>>>> In D94163#2482528 <https://reviews.llvm.org/D94163#2482528>, @craig.topper wrote:
>>>>
>>>>> I still don't understand why the existence of static rounding modes in the ISA requires that we have to use them for the default environment. X86 doesn't have static rounding mode prior to AVX512 so uses dynamic in the default mode.
>>>>
>>>> It is more convenient. Instructions with static rounding mode do not depend on `frm` so they may be scheduled more freely. Besides function with static only FP instructions may be safely called from non-default FP environment. Targets without static rounding mode don't have such possibility.
>>>
>>> If there’s no write to frm then there shouldn’t be a scheduling issue.
>>
>> Sure. Such issue rises when there is write to `frm`.  Consider the following pseudo code:
>>
>>   float a = ...
>>   for (int i = ...) {
>>     fesetround(FE_TOWARDZERO); // csrw frm, 1
>>     ...
>>     x[i] += floor(a); // fcvt ..., rdn
>>
>> `floor(a)` is a loop invariant and could be hoisted off the loop. It is possible as `fcvt` uses static rounding. However if `fcvt` uses dynamic rounding, it depends on `frm`, which is changed above, so it cannot be moved out of the loop.
>
> Why wouldn't that have been hoisted out of the loop by IR LICM? Machine LICM is primarily intended to move stack reloads and constant pool loads. It only runs on the outermost loop with a preheader.

This is another example:

  %1:fpr32 = …
  %2:fpr32 = …
  ...
  csrw frm, rdn
  …
  %3:fpr32 = FADD_S killed %1:fpr32, killed %2:fpr32, rne  

In this code scheduler can move the instruction FADD_S upward, live ranges of %1 and %2 becomes shorter and register pressure decreases. If FADD instruction implicitly depends on `frm`, the scheduler cannot move FADD_S above `csrw`, so such optimization is not possible.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94163/new/

https://reviews.llvm.org/D94163