[PATCH] D155600: [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)
Hubert Tong via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 22 21:18:19 PDT 2023
hubert.reinterpretcast added inline comments.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7625-7630
+ if (RequiresMod4Offset) {
+ if (isa<GlobalAddressSDNode>(BaseOp1) && HasAIXSmallLocalExecTLS) {
+ const auto *GV = dyn_cast<GlobalAddressSDNode>(BaseOp1)->getGlobal();
+ Align Alignment = GV->getPointerAlignment(CurDAG->getDataLayout());
+ if (Alignment < 4)
+ continue;
----------------
amyk wrote:
> hubert.reinterpretcast wrote:
> > hubert.reinterpretcast wrote:
> > > If we are to keep the condition, this needs a comment for why `HasAIXSmallLocalExecTLS` is a sufficient guard to limit the behaviour to just the new AIX "small TLS" accesses. The logic would be that the option is only valid for AIX and, on AIX, a `ADDI`/`ADDI8` node with a global address for the immediate operand would necessarily be for "small TLS" (although possibly for the `local-dynamic` TLS model with future developments).
> > >
> > > However: The optimization (with the checks) is generally sound, so maybe there should be no such check at all (except to limit the scope of the current change because some tests need updating). Yes, there can be downstream code that may have issues with generating the global address as an immediate operand for the load/store instruction, but changing the downstream code seems to be the solution for those cases.
> > >
> > > If we keep the check, there should be a TODO comment to remove the artificial limitation.
> > Aside from the checks that limit the global address (as opposed to constant) immediate operand handling to "small TLS", this code does the same thing as the code for the toc-data cases (except that which operand is the immediate operand is different between the two cases).
> >
> > The code can be made common (using `ImmOpnd`) in the non-`ReplaceFlags` path at line 7717.
> >
> > Note: `isa` followed by `cast`/`dyn_cast` is an anti-pattern.
> - I think there are possibilities that `HasAIXSmallLocalExecTLS` may not be necessary here. Just wanted to check, when you say "artificial limitation", do you mean that `HasAIXSmallLocalExecTLS` check?
> - Are you suggesting that this `else if` should be removed as we have `ImmOpnd` on 7677, and that it goes into the the code in 7717? Or am I misunderstanding? Just figured I'd double check because as you mentioned, this is checking the operand of the `ADDI`, whereas I believe the below accounts for the cases where we're dealing with the offset from the load/store and combined immediate operand.
> - Are you also suggesting that the initial condition should not be `if (isa<GlobalAddressSDNode>(BaseOp1)` but rather a `dyn_cast` of a `GlobalAddressSDNode`?
> - I think there are possibilities that `HasAIXSmallLocalExecTLS` may not be necessary here. Just wanted to check, when you say "artificial limitation", do you mean that `HasAIXSmallLocalExecTLS` check?
Yes. Removing it does enable the optimization in more cases (e.g., TLS on Linux) though. It may make sense to pre-commit this change first.
> - Are you suggesting that this `else if` should be removed as we have `ImmOpnd` on 7677, and that it goes into the the code in 7717? Or am I misunderstanding? Just figured I'd double check because as you mentioned, this is checking the operand of the `ADDI`, whereas I believe the below accounts for the cases where we're dealing with the offset from the load/store and combined immediate operand.
No. This comment is about the non-`else if` part: That code can be removed (some adjustment would be needed to the `else if` part here, but there are multiple choices for how much to do) and the substantially similar code near 7649 can be moved into the code at 7717 (with usage of `ImmOpnd`). That way, we have one less copy of the logic.
> - Are you also suggesting that the initial condition should not be `if (isa<GlobalAddressSDNode>(BaseOp1)` but rather a `dyn_cast` of a `GlobalAddressSDNode`?
Yes, and that is what the code at 7649 does.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7633
+ (Base.getConstantOperandVal(1) % 4 != 0))
+ continue;
+ }
----------------
hubert.reinterpretcast wrote:
> Please add a FIXME comment that this does not account for cases where the offset from the load/store causes the combined immediate operand to be sufficiently aligned of the instruction encoding.
For the `else if` part here, you may want to remove it in favour of the logic at line 7723.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D155600/new/
https://reviews.llvm.org/D155600
More information about the llvm-commits
mailing list