[PATCH] D123408: [InstCombine] Limit folding of cast into PHI

Tue May 3 12:33:43 PDT 2022

lebedev.ri added a comment.

In D123408#3489191 <https://reviews.llvm.org/D123408#3489191>, @syzaara wrote:

> In D123408#3443538 <https://reviews.llvm.org/D123408#3443538>, @bmahjour wrote:
>
>> The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form `<op>-then-<lop>` (eg `RecurKind::AddThenAnd`, `RecurKind::MulThenAnd`, `RecurKind::OrThenAnd`, and so on)...and that's just for combining logical `and` with the known integer reduction ops, so if we want to support e.g. `or` we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the `<op>`, and the `<lop>` has to be applied when reducing the final vector into a single scalar upon loop exit.
>>
>> @lebedev.ri is this what you had in mind or is there a better way to do it?
>
> @lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?

Sorry, lost track here. I'm not familiar enough with LV to recommend the solution,
but it sounds vaguely reasonable to me. But, do you need the whole `<op>-then-<lop>` generality?
The only reason why `<op>-then-and` is useful, is because that `and` specifies
the effective bitwidth of the reduction, but if the high bits aren't demanded arithmetic/logic ops can be losslessly performed in narrower bit widths:
`i32 65535 + i32 65535 = i32 131070 = 0x1FFFE`, `(trunc(i32 65535) to i8) + (trunc(i32 65535) to i8) = i8 510 = 0xFE`, note how low 8 bits are the same.
Perhaps the solution should be around tracking the demanded bit width?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123408/new/

https://reviews.llvm.org/D123408