[PATCH] D155600: [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)

Hubert Tong via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Aug 13 21:09:27 PDT 2023


hubert.reinterpretcast added inline comments.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7622-7624
       // For these cases, the immediate may not be divisible by 4, in
       // which case the fold is illegal for DS-form instructions.  (The
       // other cases provide aligned addresses and are always safe.)
----------------
The comment is already wrong because the non-`ReplaceFlags` path can handle non-zero load/store offsets in combination with an original constant immediate operand that is not divisible by 4. Additionally, the other cases now include the toc-data cases (which does not necessarily provide aligned addresses).

I would rather have no comment than a comment that is actively wrong. Further below, I recommend a FIXME comment.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7625-7630
+      if (RequiresMod4Offset) {
+        if (isa<GlobalAddressSDNode>(BaseOp1) && HasAIXSmallLocalExecTLS) {
+          const auto *GV = dyn_cast<GlobalAddressSDNode>(BaseOp1)->getGlobal();
+          Align Alignment = GV->getPointerAlignment(CurDAG->getDataLayout());
+          if (Alignment < 4)
+            continue;
----------------
If we are to keep the condition, this needs a comment for why `HasAIXSmallLocalExecTLS` is a sufficient guard to limit the behaviour to just the new AIX "small TLS" accesses. The logic would be that the option is only valid for AIX and, on AIX, a `ADDI`/`ADDI8` node with a global address for the immediate operand would necessarily be for "small TLS" (although possibly for the `local-dynamic` TLS model with future developments).

However: The optimization (with the checks) is generally sound, so maybe there should be no such check at all (except to limit the scope of the current change because some tests need updating). Yes, there can be downstream code that may have issues with generating the global address as an immediate operand for the load/store instruction, but changing the downstream code seems to be the solution for those cases.

If we keep the check, there should be a TODO comment to remove the artificial limitation.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7625-7630
+      if (RequiresMod4Offset) {
+        if (isa<GlobalAddressSDNode>(BaseOp1) && HasAIXSmallLocalExecTLS) {
+          const auto *GV = dyn_cast<GlobalAddressSDNode>(BaseOp1)->getGlobal();
+          Align Alignment = GV->getPointerAlignment(CurDAG->getDataLayout());
+          if (Alignment < 4)
+            continue;
----------------
hubert.reinterpretcast wrote:
> If we are to keep the condition, this needs a comment for why `HasAIXSmallLocalExecTLS` is a sufficient guard to limit the behaviour to just the new AIX "small TLS" accesses. The logic would be that the option is only valid for AIX and, on AIX, a `ADDI`/`ADDI8` node with a global address for the immediate operand would necessarily be for "small TLS" (although possibly for the `local-dynamic` TLS model with future developments).
> 
> However: The optimization (with the checks) is generally sound, so maybe there should be no such check at all (except to limit the scope of the current change because some tests need updating). Yes, there can be downstream code that may have issues with generating the global address as an immediate operand for the load/store instruction, but changing the downstream code seems to be the solution for those cases.
> 
> If we keep the check, there should be a TODO comment to remove the artificial limitation.
Aside from the checks that limit the global address (as opposed to constant) immediate operand handling to "small TLS", this code does the same thing as the code for the toc-data cases (except that which operand is the immediate operand is different between the two cases).

The code can be made common (using `ImmOpnd`) in the non-`ReplaceFlags` path at line 7717.

Note: `isa` followed by `cast`/`dyn_cast` is an anti-pattern.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:7633
+                   (Base.getConstantOperandVal(1) % 4 != 0))
+          continue;
+      }
----------------
Please add a FIXME comment that this does not account for cases where the offset from the load/store causes the combined immediate operand to be sufficiently aligned of the instruction encoding.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:3360
+        if (GVType->isSized() && !GVType->isEmptyTy() &&
+            GV->getParent()->getDataLayout().getTypeAllocSize(GVType) <
+                AIXSmallTlsPolicySizeLimit)
----------------
Sorry for the extra churn: with the new name, we should use `<=` here (and adjust the value to `32751`).


================
Comment at: llvm/lib/Target/PowerPC/PPCMCInstLower.cpp:97
+  else if (MO.getTargetFlags() == PPCII::MO_TPREL_FLAG) {
+    assert(MO.isGlobal() && "Only expecting a global MachineOperand here!\n");
+    TLSModel::Model Model = TM.getTLSModel(MO.getGlobal());
----------------
`assert`s print the spelling of the expression, so the `\n` is not needed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155600/new/

https://reviews.llvm.org/D155600



More information about the llvm-commits mailing list