[llvm] [AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (PR #71485)

Amy Kwan via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 18 14:53:21 PST 2024


================
@@ -7566,6 +7566,100 @@ static void reduceVSXSwap(SDNode *N, SelectionDAG *DAG) {
   DAG->ReplaceAllUsesOfValueWith(SDValue(N, 0), N->getOperand(0));
 }
 
+// Is an ADDI eligible for folding for non-TOC-based local-exec accesses?
+static bool isEligibleToFoldADDIForLocalExecAccesses(SDNode *N,
+                                                     SelectionDAG *DAG,
+                                                     SDValue ADDIToFold) {
+  const PPCSubtarget &Subtarget =
+      DAG->getMachineFunction().getSubtarget<PPCSubtarget>();
+  // This optimization is only performed for non-TOC-based local-exec accesses.
+  if (!Subtarget.hasAIXSmallLocalExecTLS())
+    return false;
+
+  // Check if ADDIToFold (the ADDI that we want to fold into local-exec
+  // accesses), is truly an ADDI.
+  if (!ADDIToFold.isMachineOpcode() ||
+      (ADDIToFold.getMachineOpcode() != PPC::ADDI8))
+    return false;
+
+  // The first operand of the ADDIToFold should be the thread pointer.
+  // This transformation is only performed if the first operand of the
+  // addi is the thread pointer.
+  SDValue TPRegNode = ADDIToFold.getOperand(0);
+  RegisterSDNode *TPReg = dyn_cast_or_null<RegisterSDNode>(TPRegNode.getNode());
+  if (!TPReg || (TPReg->getReg() != Subtarget.getThreadPointerRegister()))
+    return false;
+
+  // The second operand of the ADDIToFold should be the global TLS address
+  // (the local-exec TLS variable). We only perform the folding if the TLS
+  // variable is the second operand.
+  SDValue TLSVarNode = ADDIToFold.getOperand(1);
+  GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(TLSVarNode);
+  if (!GA)
+    return false;
+
+  // The local-exec TLS variable should only have the MO_TPREL_FLAG target flag,
+  // so this optimization is not performed otherwise if the flag is not set.
+  unsigned TargetFlags = GA->getTargetFlags();
+  if (TargetFlags != PPCII::MO_TPREL_FLAG)
+    return false;
+
+  // If all conditions are satisfied, the ADDI is valid for folding.
+  return true;
+}
+
+// For non-TOC-based local-exec access where an addi is feeding into another
+// addi, fold this sequence into a single addi if possible.
+// Before this optimization, the sequence appears as:
+//    addi rN, r13, sym at le
+//    addi rM, rN, imm
+// After this optimization, we can fold the two addi into a single one:
+//    addi rM, r13, sym at le + imm
+static void foldADDIForLocalExecAccesses(SDNode *N, SelectionDAG *DAG) {
+  if (N->getMachineOpcode() != PPC::ADDI8)
----------------
amy-kwan wrote:

Fair enough. To avoid the function call overhead, I can move the check for the `PPC::ADDI8` opcode outside of the function.

Perhaps it may make sense then, to add an `assert()` inside `foldADDIForLocalExecAccesses()` to make sure we have an `ADDI8` opcode, because the intention is for this function to be used for addis only. Do you have any thoughts about this idea (move the opcode check outside the function, but inside this function, add an assert)?

https://github.com/llvm/llvm-project/pull/71485


More information about the llvm-commits mailing list