[PATCH] D140493: [SROA] Support promotion in presence of variably-indexed loads

Thu Dec 22 10:01:37 PST 2022

lebedev.ri added a comment.

@craig.topper thank you!

In D140493#4013450 <https://reviews.llvm.org/D140493#4013450>, @craig.topper wrote:

> In D140493#4013142 <https://reviews.llvm.org/D140493#4013142>, @lebedev.ri wrote:
>
>> @craig.topper @reames Hi! Can you comment on the profitability heuristic here? Does it appear that we need a TTI hook?
>>
>> In D140493#4012533 <https://reviews.llvm.org/D140493#4012533>, @nikic wrote:
>>
>>> I think this is moving in the right direction in terms of how the transform should be implemented. However, I have two high-level concerns:
>>>
>>> The first one is basically the same as for the "whole alloca to vector promotion": If we perform this transform and then inline and it turns out that the offset is now known, it's likely that we're now going to generate much worse code -- we're likely not going to be able to get rid of the freeze and vector manipulation. I think it is very likely that the "second-order effects" you refer to are codegen regressions rather than improvements (will need to be investigated in either case).
>>>
>>> The second is that even if we ignore that case, the profitability of this transform is very unclear to me. https://llvm.godbolt.org/z/W6e7K5W9d has a representative case (load i32 from 8 byte buffer after sroa+instcombine, with two kinds of init because some targets really hate the vector init). I'm not even sure that this is better on x86_64, but there are some targets where it's clearly worse, e.g. riscv32: https://llvm.godbolt.org/z/fzK8sbaEv If we want to do this transform, we'll probably not be able to avoid TTI-based cost modelling. It's also possible that this transform just isn't profitable in isolation (i.e. without the larger context of your examples).
>>>
>>> Regarding the rewrite itself, are you possibly looking for the EmitGEPOffset() helper? It should be possible to sum the results of EmitGEPOffset() on all the GEPs in the chain to obtain the desired offset.
>
> From a quick glance I'm seeing a couple issues. The first is that the basic RISC-V base ISA doesn't support vectors and SelectionDAG is fully scalarizing the vector load/store.

Right, i'm looking into that right now. We should be getting rid of vector loads in these cases in SDAG.

> The other issue is that the test contains a variable shift of an i64 which isn't directly supported on a 32-bit target. RISC-V doesn't have an instructions like X86's SHLD/SHRD so we have to do it in multiple instructions.

Err, this is actually expected, see profitability reasoning in description/diff.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140493/new/

https://reviews.llvm.org/D140493