[llvm-commits] [PATCH, RFC] Support for global-dynamic TLS mode in 64-bit PowerPC target -- preliminary

Bill Schmidt wschmidt at linux.vnet.ibm.com
Tue Dec 11 08:00:47 PST 2012



On Mon, 2012-12-10 at 19:13 -0600, Bill Schmidt wrote:
> This WIP patch implements the global-dynamic TLS model for 64-bit PowerPC.
> However, it is not ideal, and I need some help figuring out how I can improve
> it.
> 
> Given a thread-local symbol x with global-dynamic access, the code sequence
> to be generated to obtain x's address is:
> 
>      Instruction                            Relocation            Symbol
>   addis ra,r2,x at got@tlsgd at ha           R_PPC64_GOT_TLSGD16_HA       x
>   addi  r3,ra,x at got@tlsgd at l            R_PPC64_GOT_TLSGD16_L        x
>   bl __tls_get_addr                    R_PPC64_TLSGD                x
>                                        R_PPC64_REL24           __tls_get_addr
>   nop
>   <use address in r3>
> 
> The way I've approached this is to have LowerGlobalTLSAddress convert
> the TargetGlobalAddress node into a DAG of three nodes:
> 
>   GET_TLS_ADDR(
>     ADDI_TLSGD_L(
>       ADDIS_TLSGD_HA(X2, x),
>       x),
>     x)
> 
> The problem is that straightforward assembly of this DAG structure gives
> the following inferior assembly code:
> 
>      Instruction                            Relocation            Symbol
>   addis ra,r2,x at got@tlsgd at ha           R_PPC64_GOT_TLSGD16_HA       x
>   addi  rb,ra,x at got@tlsgd at l            R_PPC64_GOT_TLSGD16_L        x
>   addi  r3,rb,0
>   bl    __tls_get_addr                 R_PPC64_TLSGD                x
>                                        R_PPC64_REL24           __tls_get_addr
>   nop
>   addi  rc,r3,0
>   <use address in rc>
> 
> This is because the call to __tls_get_addr requires its argument and its
> return value to use register X3, so copies are generated to move between
> the logical registers and the physical register X3.
> 
> There are two approaches that I thought of to rectify this, but so far I
> don't see how to make either of them work.  Both would be done in
> LowerGlobalTLSAddress instead of generating the GET_TLS_ADDR node.
> Therefore the copies would be generated early enough that register
> assignment could coalesce them away.
> 
>  (1) Use LowerCallTo() to create a call sequence with one argument:
>      the result of ADDI_TLSGD_L(...), analogously to what's done for
>      LowerINIT_TRAMPOLINE.  This seems like the obvious thing to do,
>      until you realize that you need a token chain SDNode to generate
>      a call, and LowerGlobalTLSAddress doesn't provide one.

I think what I can do here is use the function entry node as the chain.
Since the call has no side effects and is tied into place by its
argument and return value, this seems sufficient at first glance.  I
cobbled something up and it looks like it will do what I want on a
simple test; but of course simple tests usually miss corner cases.
Anyone see a problem with this approach?

Thanks,
Bill

> 
>  (2) Use getCopyToReg and getCopyFromReg to generate the copies
>      directly around GET_TLS_ADDR, which then just expands into the
>      "bl" and the "nop".  Unfortunately, these routines also require
>      a token chain node.
> 
> So, part of my problem is knowing the rules about token chains.  I'm 
> pretty new to LLVM and I'm not sure exactly what purposes may be served
> by them, other than tying nodes together that otherwise would not be.
> Do I really need a token chain node here, or it OK to have a NULL chain
> node and use one of the above solutions?  (Seems unlikely, but it would
> be convenient if so...)
> 
> Assuming I'm not so fortunate, what would be the best way to approach
> this problem?
> 
> 
> There's another aspect of this patch that doesn't make me happy:  the
> hackery I added in PPCMCCodeEmitter.cpp:getDirectBrEncoding().  Each
> of these get...Encoding() routines is supposed to be called on behalf
> of one operand at a time.  When using integrated assembly, I couldn't
> get the second operand of the BL8_NOP_ELF_TLSGD to produce a relocation
> the "correct" way, so for now I put in a bloody hack to handle both
> operands at once.  Perhaps somebody can see what I'm doing wrong.
> 
> The routine I want to call is PPCMCCodeEmitter.cpp:getTLSGDEncoding().
> In PPCInstr64Bit.td, I defined the "tlsgd" operand class to use this
> method, and specified it as the second input operand for BL8_NOP_ELF_TLSGD.
> However, the encoding code generated by TblGen treated this exactly as
> BL8_NOP_ELF, which has no second operand, and thus my encoder was never
> called.  I didn't see anything particular about IForm_and_DForm_4_zero
> that would explain this.  I'm currently at a loss to explain it.
> 
> Thanks for any help with my issues!
> 
> Bill
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list