[llvm-commits] [PATCH, RFC] Support for global-dynamic TLS mode in 64-bit PowerPC target -- preliminary

Bill Schmidt wschmidt at linux.vnet.ibm.com
Tue Dec 11 12:37:27 PST 2012


OK, I sorted through my issues and have committed the revised version as
r169910.

Bill

On Tue, 2012-12-11 at 10:00 -0600, Bill Schmidt wrote:
> 
> On Mon, 2012-12-10 at 19:13 -0600, Bill Schmidt wrote:
> > This WIP patch implements the global-dynamic TLS model for 64-bit PowerPC.
> > However, it is not ideal, and I need some help figuring out how I can improve
> > it.
> > 
> > Given a thread-local symbol x with global-dynamic access, the code sequence
> > to be generated to obtain x's address is:
> > 
> >      Instruction                            Relocation            Symbol
> >   addis ra,r2,x at got@tlsgd at ha           R_PPC64_GOT_TLSGD16_HA       x
> >   addi  r3,ra,x at got@tlsgd at l            R_PPC64_GOT_TLSGD16_L        x
> >   bl __tls_get_addr                    R_PPC64_TLSGD                x
> >                                        R_PPC64_REL24           __tls_get_addr
> >   nop
> >   <use address in r3>
> > 
> > The way I've approached this is to have LowerGlobalTLSAddress convert
> > the TargetGlobalAddress node into a DAG of three nodes:
> > 
> >   GET_TLS_ADDR(
> >     ADDI_TLSGD_L(
> >       ADDIS_TLSGD_HA(X2, x),
> >       x),
> >     x)
> > 
> > The problem is that straightforward assembly of this DAG structure gives
> > the following inferior assembly code:
> > 
> >      Instruction                            Relocation            Symbol
> >   addis ra,r2,x at got@tlsgd at ha           R_PPC64_GOT_TLSGD16_HA       x
> >   addi  rb,ra,x at got@tlsgd at l            R_PPC64_GOT_TLSGD16_L        x
> >   addi  r3,rb,0
> >   bl    __tls_get_addr                 R_PPC64_TLSGD                x
> >                                        R_PPC64_REL24           __tls_get_addr
> >   nop
> >   addi  rc,r3,0
> >   <use address in rc>
> > 
> > This is because the call to __tls_get_addr requires its argument and its
> > return value to use register X3, so copies are generated to move between
> > the logical registers and the physical register X3.
> > 
> > There are two approaches that I thought of to rectify this, but so far I
> > don't see how to make either of them work.  Both would be done in
> > LowerGlobalTLSAddress instead of generating the GET_TLS_ADDR node.
> > Therefore the copies would be generated early enough that register
> > assignment could coalesce them away.
> > 
> >  (1) Use LowerCallTo() to create a call sequence with one argument:
> >      the result of ADDI_TLSGD_L(...), analogously to what's done for
> >      LowerINIT_TRAMPOLINE.  This seems like the obvious thing to do,
> >      until you realize that you need a token chain SDNode to generate
> >      a call, and LowerGlobalTLSAddress doesn't provide one.
> 
> I think what I can do here is use the function entry node as the chain.
> Since the call has no side effects and is tied into place by its
> argument and return value, this seems sufficient at first glance.  I
> cobbled something up and it looks like it will do what I want on a
> simple test; but of course simple tests usually miss corner cases.
> Anyone see a problem with this approach?
> 
> Thanks,
> Bill
> 
> > 
> >  (2) Use getCopyToReg and getCopyFromReg to generate the copies
> >      directly around GET_TLS_ADDR, which then just expands into the
> >      "bl" and the "nop".  Unfortunately, these routines also require
> >      a token chain node.
> > 
> > So, part of my problem is knowing the rules about token chains.  I'm 
> > pretty new to LLVM and I'm not sure exactly what purposes may be served
> > by them, other than tying nodes together that otherwise would not be.
> > Do I really need a token chain node here, or it OK to have a NULL chain
> > node and use one of the above solutions?  (Seems unlikely, but it would
> > be convenient if so...)
> > 
> > Assuming I'm not so fortunate, what would be the best way to approach
> > this problem?
> > 
> > 
> > There's another aspect of this patch that doesn't make me happy:  the
> > hackery I added in PPCMCCodeEmitter.cpp:getDirectBrEncoding().  Each
> > of these get...Encoding() routines is supposed to be called on behalf
> > of one operand at a time.  When using integrated assembly, I couldn't
> > get the second operand of the BL8_NOP_ELF_TLSGD to produce a relocation
> > the "correct" way, so for now I put in a bloody hack to handle both
> > operands at once.  Perhaps somebody can see what I'm doing wrong.
> > 
> > The routine I want to call is PPCMCCodeEmitter.cpp:getTLSGDEncoding().
> > In PPCInstr64Bit.td, I defined the "tlsgd" operand class to use this
> > method, and specified it as the second input operand for BL8_NOP_ELF_TLSGD.
> > However, the encoding code generated by TblGen treated this exactly as
> > BL8_NOP_ELF, which has no second operand, and thus my encoder was never
> > called.  I didn't see anything particular about IForm_and_DForm_4_zero
> > that would explain this.  I'm currently at a loss to explain it.
> > 
> > Thanks for any help with my issues!
> > 
> > Bill
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 




More information about the llvm-commits mailing list