[PATCH] D111077: [LV] Support converting FP add to integer reductions.

Thu Oct 7 02:38:04 PDT 2021

dmgreen added a comment.

In D111077#3040938 <https://reviews.llvm.org/D111077#3040938>, @fhahn wrote:

> In D111077#3040417 <https://reviews.llvm.org/D111077#3040417>, @dmgreen wrote:
>
>> Interesting idea. Are these two bits of code always the same?
>> https://godbolt.org/z/EfPKPTMdf
>
> I think both cases above should be the same. But I think we can construct slight variations where is they would not be. E.g. consider a loop where the induction variable starts at 0 and is incremented and overflow is allowed. If `n` would be negative, the result of removing the loop and converting `n` to a float would yield a negative number , but the loop version would always return a positive number. I might be missing some subtleties when it comes to sign handling, perhaps @scanon as further thoughts.

Yep, I was ignoring the negative numbers :) I meant more about the general idea of converting the loop to straight line code.

>> Should we be doing this more generally, outside the vectorizing reductions?
>
> I think it might be worthwhile to convert such reductions outside the vectorizer in some cases. My motivation for starting in LV is that it should be clearly profitable if it allows vectorization. For general loops without vectorization, it might not be profitable I think, e.g. for loops that only execute once, due to the conversion overhead.

As far as I can tell from this code: https://godbolt.org/z/caPszPafr
The trace through when n==1 would be

  cmp     w1, #1
  b.lt    .LBB0_3
  cmp     w1, #1
  b.ne    .LBB0_4
  mov     w8, wzr
  movi    d0, #0000000000000000
  b       .LBB0_7
  sub     w8, w1, w8
  fmov    s1, #1.00000000
  subs    w8, w8, #1
  fadd    s0, s0, s1
  b.ne    .LBB0_8
  ret

vs straight line code with no branches:

  bic     w8, w1, w1, asr #31
  mov     w9, #1266679808
  scvtf   s0, w8
  fmov    s1, w9
  fminnm  s0, s0, s1
  ret

And that's not including vectorization. It's kind of like a "high cost expansion" from SCEV (but to be fair as far as I understand we wouldn't always rewrite high cost exit values, even if it would mean deleting the loop (?)).  Which is what made me wonder if we should be doing it generally, not just in the vectorizer. (Not that I have anything against this patch - it looks pretty sensible and doesn't complicate the reduction code any more than it already is. It seems to fit quite well).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111077/new/

https://reviews.llvm.org/D111077