[PATCH] D123408: [InstCombine] Limit folding of cast into PHI

Mon Jun 6 10:44:35 PDT 2022

bmahjour added a comment.

In D123408#3489233 <https://reviews.llvm.org/D123408#3489233>, @lebedev.ri wrote:

> In D123408#3489191 <https://reviews.llvm.org/D123408#3489191>, @syzaara wrote:
>
>> In D123408#3443538 <https://reviews.llvm.org/D123408#3443538>, @bmahjour wrote:
>>
>>> The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form `<op>-then-<lop>` (eg `RecurKind::AddThenAnd`, `RecurKind::MulThenAnd`, `RecurKind::OrThenAnd`, and so on)...and that's just for combining logical `and` with the known integer reduction ops, so if we want to support e.g. `or` we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the `<op>`, and the `<lop>` has to be applied when reducing the final vector into a single scalar upon loop exit.
>>>
>>> @lebedev.ri is this what you had in mind or is there a better way to do it?
>>
>> @lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?
>
> Sorry, lost track here. I'm not familiar enough with LV to recommend the solution,
> but it sounds vaguely reasonable to me. But, do you need the whole `<op>-then-<lop>` generality?
> The only reason why `<op>-then-and` is useful, is because that `and` specifies
> the effective bitwidth of the reduction, but if the high bits aren't demanded arithmetic/logic ops can be losslessly performed in narrower bit widths:
> `i32 65535 + i32 65535 = i32 131070 = 0x1FFFE`, `(trunc(i32 65535) to i8) + (trunc(i32 65535) to i8) = i8 510 = 0xFE`, note how low 8 bits are the same.
> Perhaps the solution should be around tracking the demanded bit width?

If we think about this problem as an issue with determination of demanded bitwidth, I'm not sure why it would be the job of LV to understand it and treat it in a special way, where InstCombine could have determined it and generated code without the extra widening/truncating instructions. After all, InstCombine is meant to simplify the IR for downstream passes (not LV or other non-canonicalization passes). The one complication is the effect of `nsw` on that `add`. Not sure if InstCombine aims at keeping that flag and knowingly treats it as more beneficial than making other simplifications. I suspect it's not, but would be good to verify.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123408/new/

https://reviews.llvm.org/D123408