[PATCH] D128582: Enable SeparateConstOffsetFromGEPPass() at -O3 and -O2
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jul 9 04:08:20 PDT 2022
dmgreen added a comment.
In D128582#3637881 <https://reviews.llvm.org/D128582#3637881>, @gsocshubham wrote:
> In D128582#3628937 <https://reviews.llvm.org/D128582#3628937>, @dmgreen wrote:
>
>>> Passing `-aarch64-enable-gep-opt=true -O3` would do the job - https://clang.godbolt.org/z/hcjsh1vex what this patch was proposing. WDYT?
>>>
>>> ..
>>>
>>> Sure. For #50528, enabling `EnableGEPOpt` reduces GEP instructions by half but is done by splitting GEP into ptr2int and int2ptr.
>>
>> Oh yeah I see it does. I must have missed the -O3 off the time I tried it. That's good. I wonder why it was never enabled in the past.
>>
>> From the look of it, it runs after LSR, which I think it would need to run before. Otherwise it is likely to mess up what LSR has tried to do. That would be before the call to TargetPassConfig::addIRPasses(). I'm not sure if the LICM run is necessary either, but I see it is used in other backends. We would need to gather some benchmark to see how it behaves. Like do the issues in x264 still occur, and what happens across more benchmark cases.
>
> Will running it before LSR make any difference compared to current location?
>
> The issue in x264 was occuring due to pass registeration way early at IR level. It is irrelevant now since it is already enabled in AArch64 if passed relevant flags. As we have been given suggestion to move GEP pass just before Selection DAG, I have registered/moved GEP pass from `addIRPasses()` to at the end of `AArch64PassConfig::addCodeGenPrepare()` but there is no regression in x264 benchmark result as compared to master.
Everything in `AArch64PassConfig::addIRPasses` counts as the "backend" from the point of view of LLVM. They are still llvm-ir passes, but ran as part of the backend prior to ISel lowering to perform certain optimizations and help with lowering that are target-specific, but easier to perform on IR than MIR. I think the pass really needs to run before the call to LSR, as LSR will be very opinionated about the geps in loops and we don't want to mess that up. LSR is added by `TargetPassConfig::addIRPasses()`, so moving the SeparateConstOffsetFromGEP passes anywhere before that call in `AArch64PassConfig::addIRPasses` should be OK.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128582/new/
https://reviews.llvm.org/D128582
More information about the llvm-commits
mailing list