[PATCH] D155600: [AIX][TLS] Produce a faster local-exec access sequence with -maix-small-local-exec-tls (And optimize when load/store offsets are 0)

Wed Aug 2 10:01:33 PDT 2023

hubert.reinterpretcast added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:3366
+      if (HasAIXSmallLocalExecTLS &&
+          (GVTypeSize < (AIXTLSUpperDisplacement - 8)))
+        return DAG.getNode(PPCISD::Lo, dl, PtrVT, VariableOffsetTGA, TLSReg);
----------------
DiggerLin wrote:
> nemanjai wrote:
> > hubert.reinterpretcast wrote:
> > > What does `AIXTLSUpperDisplacement` represent? It is already given a value of 32K - 8. Should the hard coded subtraction be here, should the value of `AIXTLSUpperDisplacement` be further adjusted instead, or is there unintentional double adjustment happening?
> > > 
> > > For reference, I encountered no issues linking the result of assembly that performs `la` (using the subject access pattern) for the past-the-end address of a 32K - 1 local-exec TLS variable.
> > I agree. We shouldn't be doing any arithmetic here on `AIXTLSUpperDisplacement`. We should set it to what we want it and only perform comparisons.
> according to https://www.ibm.com/docs/en/aix/7.2?topic=program-using-thread-local-storage
> 
> `The local-dynamic and local-exec access methods have a faster code sequence that can be used if the total size of thread-local variables is smaller than 62 KB (I think it is typo here). If the total size of the region is too large, the link-editor will patch the code by generating extra instructions, negating the benefit of using the faster code sequence.`
> 
> we need to calculate sum of ` the total size of thread-local variables` somewhere and check whether the value is less than 32K. 
> does the `AIXTLSUpperDisplacement` want to express  sum of ` the total size of thread-local variables` ? if it the calculation is not correct.
> 
>  
> 
> 
There is no intent to calculate the "total size of thread-local variables" here (there is no real point in doing so at a translation-unit level anyway). The documentation you quote is fairly loose about its use of "can". It is also incorrect in stating that the benefits of the faster code sequence are necessarily negated: an additional benefit of the faster code sequence is less use of the TOC.

It is entirely possible to have a variable > 32K allocated into the area accessible with the small TLS access sequence; however, not all bytes of such a variable can be accessed directly. This does not mean that we can't generate small TLS access sequences for such a variable, but the IBM XL compiler went with the limit (and it is less adventurous for us to follow suit on this).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155600/new/

https://reviews.llvm.org/D155600