[llvm] [AIX][TLS] Optimize the small local-exec access sequence for non-zero offsets (PR #71485)
Amy Kwan via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 31 12:18:18 PST 2024
================
@@ -1523,30 +1590,73 @@ void PPCAsmPrinter::emitInstruction(const MachineInstr *MI) {
EmitToStreamer(*OutStreamer, MCInstBuilder(PPC::EnforceIEIO));
return;
}
- case PPC::ADDI8: {
- // The faster non-TOC-based local-exec sequence is represented by `addi`
- // with an immediate operand having the MO_TPREL_FLAG. Such an instruction
- // does not otherwise arise.
- unsigned Flag = MI->getOperand(2).getTargetFlags();
- if (Flag == PPCII::MO_TPREL_FLAG ||
- Flag == PPCII::MO_GOT_TPREL_PCREL_FLAG ||
- Flag == PPCII::MO_TPREL_PCREL_FLAG) {
- assert(
- Subtarget->hasAIXSmallLocalExecTLS() &&
- "addi with thread-pointer only expected with local-exec small TLS");
- LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
- TmpInst.setOpcode(PPC::LA8);
- EmitToStreamer(*OutStreamer, TmpInst);
- return;
- }
- break;
- }
}
LowerPPCMachineInstrToMCInst(MI, TmpInst, *this);
EmitToStreamer(*OutStreamer, TmpInst);
}
+// For non-TOC-based local-exec variables that have a non-zero offset,
+// we need to create a new MCExpr that adds the non-zero offset to the address
+// of the local-exec variable that will be used in either an addi, load or
+// store. However, the final displacement for these instructions must be
+// between [-32768, 32768), so if the TLS address + its non-zero offset is
+// greater than 32KB, a new MCExpr is produced to accommodate this situation.
+const MCExpr *PPCAsmPrinter::getAdjustedLocalExecExpr(const MachineOperand &MO,
+ int64_t Offset) {
+ // Non-zero offsets (for loads, stores or `addi`) require additional handling.
+ // When the offset is zero, there is no need to create an adjusted MCExpr.
+ if (!Offset)
+ return nullptr;
+
+ assert(MO.isGlobal() && "Only expecting a global MachineOperand here!");
+ const GlobalValue *GValue = MO.getGlobal();
+ assert(TM.getTLSModel(GValue) == TLSModel::LocalExec &&
+ "Only local-exec accesses are handled!");
+
+ bool IsGlobalADeclaration = GValue->isDeclarationForLinker();
+ // Find the GlobalVariable that corresponds to the particular TLS variable
+ // in the TLS variable-to-address mapping. All TLS variables should exist
+ // within this map, with the exception of TLS variables marked as extern.
+ const auto TLSVarsMapEntryIter = TLSVarsToAddressMapping.find(GValue);
+ if (TLSVarsMapEntryIter == TLSVarsToAddressMapping.end())
+ assert(IsGlobalADeclaration &&
+ "Only expecting to find extern TLS variables not present in the TLS "
+ "variable-to-address map!");
+
+ unsigned TLSVarAddress =
+ IsGlobalADeclaration ? 0 : TLSVarsMapEntryIter->second;
+ ptrdiff_t FinalAddress = (TLSVarAddress + Offset);
+ // If the address of the TLS variable + the offset is less than 32KB,
+ // or if the TLS variable is extern, we simply produce an MCExpr to add the
+ // non-zero offset to the TLS variable address.
+ // For when TLS variables are extern, this is safe to do because we can
+ // assume that the address of extern TLS variables are zero.
+ const MCExpr *Expr = MCSymbolRefExpr::create(
+ getSymbol(GValue), MCSymbolRefExpr::VK_PPC_AIX_TLSLE, OutContext);
+ Expr = MCBinaryExpr::createAdd(
+ Expr, MCConstantExpr::create(Offset, OutContext), OutContext);
+ if (FinalAddress >= 32768) {
+ // Handle the written offset for cases where:
+ // TLS variable address + Offset > 32KB.
+
+ // The assembly that is printed will look like:
+ // TLSVar at le + Offset - Delta
+ // where Delta is a multiple of 64KB: ((FinalAddress + 32768) & ~0xFFFF).
+ ptrdiff_t Delta = ((FinalAddress + 32768) & ~0xFFFF);
+ // Check that the total instruction displacement fits within [-32768,32768).
+ ptrdiff_t InstDisp = TLSVarAddress + Offset - Delta;
+ assert((InstDisp < 32768) ||
+ (InstDisp >= -32768) &&
+ "Expecting the instruction displacement for local-exec TLS "
+ "variables to be between [-32768, 32768)!");
----------------
amy-kwan wrote:
For larger variables, the assert will be not triggered. This is because in [the initial patch](https://github.com/llvm/llvm-project/commit/3f46e5453d9310b15d974e876f6132e3cf50c4b1#diff-909a72141a3ecd6bfde54a634[…]4c3eb79622c5845f6c7c119145af6f) that introduced this feature, I only restricted this non-TOC-based access sequence if the size of the TLS variable is less than 32751 within `PPCISelLowering.cpp`.
```
constexpr uint64_t AIXSmallTlsPolicySizeLimit = 32751;
....
// With the -maix-small-local-exec-tls option, produce a faster access
// sequence for local-exec TLS variables where the offset from the TLS
// base is encoded as an immediate operand.
//
// We only utilize the faster local-exec access sequence when the TLS
// variable has a size within the policy limit. We treat types that are
// not sized or are empty as being over the policy size limit.
if (HasAIXSmallLocalExecTLS && IsTLSLocalExecModel) {
Type *GVType = GV->getValueType();
if (GVType->isSized() && !GVType->isEmptyTy() &&
GV->getParent()->getDataLayout().getTypeAllocSize(GVType) <=
AIXSmallTlsPolicySizeLimit)
return DAG.getNode(PPCISD::Lo, dl, PtrVT, VariableOffsetTGA, TLSReg);
}
```
So in the case where the size is `8187`, the test case for the large variable in `aix-small-local-exec-tls-largeaccess.ll` looks like:
```
stw r3, mySmallLocalExecTLSv1[TL]@le(r13)
```
If the size is increased past `AIXSmallTlsPolicySizeLimit`, it will load from the TOC instead and do the regular local-exec sequence, which is expected:
```
ld r4, L..C0(r2) # target-flags(ppc-tprel) @mySmallLocalExecTLSv1
li r3, 1
add r4, r13, r4
stw r3, 0(r4)
. . .
```
Hope the above answers your question.
https://github.com/llvm/llvm-project/pull/71485
More information about the llvm-commits
mailing list