[PATCH] D123408: [InstCombine] Limit folding of cast into PHI

Zaara Syeda via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 9 09:24:48 PDT 2022


syzaara added a comment.

In D123408#3560846 <https://reviews.llvm.org/D123408#3560846>, @bmahjour wrote:

> In D123408#3489233 <https://reviews.llvm.org/D123408#3489233>, @lebedev.ri wrote:
>
>> In D123408#3489191 <https://reviews.llvm.org/D123408#3489191>, @syzaara wrote:
>>
>>> In D123408#3443538 <https://reviews.llvm.org/D123408#3443538>, @bmahjour wrote:
>>>
>>>> The cleanest way I can think of to teach LoopVectorizer about this would be to introduce a whole new set of composite reduction operations of the form `<op>-then-<lop>` (eg `RecurKind::AddThenAnd`, `RecurKind::MulThenAnd`, `RecurKind::OrThenAnd`, and so on)...and that's just for combining logical `and` with the known integer reduction ops, so if we want to support e.g. `or` we'd need to double the number of additional recurrence kinds (and the extra logic that comes with it) again. The identity value would be determined from the `<op>`, and the `<lop>` has to be applied when reducing the final vector into a single scalar upon loop exit.
>>>>
>>>> @lebedev.ri is this what you had in mind or is there a better way to do it?
>>>
>>> @lebedev.ri Can you please advise if the above described way is how we would implement this within the LoopVectorizer?
>>
>> Sorry, lost track here. I'm not familiar enough with LV to recommend the solution,
>> but it sounds vaguely reasonable to me. But, do you need the whole `<op>-then-<lop>` generality?
>> The only reason why `<op>-then-and` is useful, is because that `and` specifies
>> the effective bitwidth of the reduction, but if the high bits aren't demanded arithmetic/logic ops can be losslessly performed in narrower bit widths:
>> `i32 65535 + i32 65535 = i32 131070 = 0x1FFFE`, `(trunc(i32 65535) to i8) + (trunc(i32 65535) to i8) = i8 510 = 0xFE`, note how low 8 bits are the same.
>> Perhaps the solution should be around tracking the demanded bit width?
>
> If we think about this problem as an issue with determination of demanded bitwidth, I'm not sure why it would be the job of LV to understand it and treat it in a special way, where InstCombine could have determined it and generated code without the extra widening/truncating instructions. After all, InstCombine is meant to simplify the IR for downstream passes (not LV or other non-canonicalization passes). The one complication is the effect of `nsw` on that `add`. Not sure if InstCombine aims at keeping that flag and knowingly treats it as more beneficial than making other simplifications. I suspect it's not, but would be good to verify.

I agree, it seems more natural to me that InstCombine would do the simplification rather than LV to trace through the demanded bits. I have verified that InstCombine isn't aiming to skip the transformation in preference on keeping the nsw flag. It just happens that the phi is simplified first and we miss the opportunity to simplify the add into a smaller width add.

@spatel Could you please provide some feedback about this change to InstCombine.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123408/new/

https://reviews.llvm.org/D123408



More information about the llvm-commits mailing list