[PATCH] D140493: [SROA] Support promotion in presence of variably-indexed loads

Thu Dec 22 01:36:20 PST 2022

nikic added a comment.

I think this is moving in the right direction in terms of how the transform should be implemented. However, I have two high-level concerns:

The first one is basically the same as for the "whole alloca to vector promotion": If we perform this transform and then inline and it turns out that the offset is now known, it's likely that we're now going to generate much worse code -- we're likely not going to be able to get rid of the freeze and vector manipulation. I think it is very likely that the "second-order effects" you refer to are codegen regressions rather than improvements (will need to be investigated in either case).

The second is that even if we ignore that case, the profitability of this transform is very unclear to me. https://llvm.godbolt.org/z/W6e7K5W9d has a representative case (load i32 from 8 byte buffer after sroa+instcombine, with two kinds of init because some targets really hate the vector init). I'm not even sure that this is better on x86_64, but there are some targets where it's clearly worse, e.g. riscv32: https://llvm.godbolt.org/z/fzK8sbaEv If we want to do this transform, we'll probably not be able to avoid TTI-based cost modelling. It's also possible that this transform just isn't profitable in isolation (i.e. without the larger context of your examples).

Regarding the rewrite itself, are you possibly looking for the EmitGEPOffset() helper? It should be possible to sum the results of EmitGEPOffset() on all the GEPs in the chain to obtain the desired offset.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140493/new/

https://reviews.llvm.org/D140493