[llvm-commits] [PATCH, RFC] Medium code model support for 64-bit PowerPC

Thu Nov 15 08:56:26 PST 2012

Heh, the anti-spam filter causes some problems with examples in my
description.  Please mentally substitute "@" for " at " throughout...

Bill

On Thu, 2012-11-15 at 10:51 -0600, Bill Schmidt wrote:
> Hello,
> 
> This patch implements medium code model support for 64-bit PowerPC.
> 
> The default for 64-bit PowerPC is small code model, in which TOC entries
> must be addressable using a 16-bit offset from the TOC pointer.
> Additionally, only TOC entries are addressed via the TOC pointer.
> 
> With medium code model, TOC entries and data sections can all be
> addressed via the TOC pointer using a 32-bit offset.  Cooperation with
> the linker allows 16-bit offsets to be used when these are sufficient,
> reducing the number of extra instructions that need to be executed.
> Medium code model also does not generate explicit TOC entries in
> ".section toc" for variables that are wholly internal to the compilation
> unit.
> 
> Consider a load of an external 4-byte integer.  With small code model,
> the compiler generates:
> 
> 	ld 3, .LC1 at toc(2)
> 	lwz 4, 0(3)
> 
> 	.section	.toc,"aw", at progbits
> .LC1:
> 	.tc ei[TC],ei
> 
> With medium model, it instead generates:
> 
> 	addis 3, 2, .LC1 at toc@ha
> 	ld 3, .LC1 at toc@l(3)
> 	lwz 4, 0(3)
> 
> 	.section	.toc,"aw", at progbits
> .LC1:
> 	.tc ei[TC],ei
> 
> Here .LC1 at toc@ha is a relocation requesting the upper 16 bits of the
> 32-bit offset of ei's TOC entry from the TOC base pointer.
> Similarly, .LC1 at toc@l is a relocation requesting the lower 16 bits. Note
> that if the linker determines that ei's TOC entry is within a 16-bit
> offset of the TOC base pointer, it will replace the "addis" with a
> "nop", and replace the "ld" with the identical "ld" instruction from the
> small code model example.
> 
> Consider next a load of a function-scope static integer.  For small code
> model, the compiler generates:
> 
> 	ld 3, .LC1 at toc(2)
> 	lwz 4, 0(3)
> 
> 	.section	.toc,"aw", at progbits
> .LC1:
> 	.tc test_fn_static.si[TC],test_fn_static.si
> 	.type	test_fn_static.si, at object
> 	.local	test_fn_static.si
> 	.comm	test_fn_static.si,4,4
> 
> For medium code model, the compiler generates:
> 
> 	addis 3, 2, test_fn_static.si at toc@ha
> 	addi 3, 3, test_fn_static.si at toc@l
> 	lwz 4, 0(3)
> 
> 	.type	test_fn_static.si, at object
> 	.local	test_fn_static.si
> 	.comm	test_fn_static.si,4,4
> 
> Again, the linker may replace the "addis" with a "nop", calculating only
> a 16-bit offset when this is sufficient.
> 
> Note that it would be more efficient for the compiler to generate:
> 
> 	addis 3, 2, test_fn_static.si at toc@ha
>         lwz 4, test_fn_static.si at toc@l(3)
> 
> The current patch does not perform this optimization yet.  This will be
> addressed as a peephole optimization in a later patch.
> 
> For the moment, the default code model for 64-bit PowerPC will remain
> the small code model.  We plan to eventually change the default to
> medium code model, which matches current upstream GCC behavior.  Note
> that the different code models are ABI-compatible, so code compiled with
> different models will be linked and execute correctly.
> 
> I've tested the regression suite and the application/benchmark test
> suite in two ways:  Once with the patch as submitted here, and once with
> additional logic to force medium code model as the default.  The tests
> all compile cleanly, with one exception.  The mandel-2 application test
> fails due to an unrelated ABI compatibility with passing complex
> numbers.  It just so happens that small code model was incredibly lucky,
> in that temporary values in floating-point registers held the expected
> values needed by the external library routine that was called
> incorrectly.  My current thought is to correct the ABI problems with
> _Complex before making medium code model the default, to avoid
> introducing this "regression."
> 
> Here are a few comments on how the patch works, since the selection code
> can be difficult to follow:
> 
> The existing logic for small code model defines three
> pseudo-instructions:  LDtoc for most uses, LDtocJTI for jump table
> addresses, and LDtocCPT for constant pool addresses.  These are expanded
> by SelectCodeCommon().  The pseudo-instruction approach doesn't work for
> medium code model, because we need to generate two instructions when we
> match the same pattern.  Instead, new logic in PPCDAGToDAGISel::Select()
> intercepts the TOC_ENTRY node for medium code model, and generates an
> ADDIStocHA followed by either a LDtocL or an ADDItocL.  These new node
> types correspond naturally to the sequences described above.
> 
> The addis/ld sequence is generated for the following cases:
>  * Jump table addresses
>  * Function addresses
>  * External global variables
>  * Tentative definitions of global variables (common linkage)
> 
> The addis/addi sequence is generated for the following cases:
>  * Constant pool entries
>  * File-scope static global variables
>  * Function-scope static variables
> 
> Expanding to the two-instruction sequences at select time exposes the
> instructions to subsequent optimization, particularly scheduling.
> 
> The rest of the processing occurs at assembly time, in
> PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted
> to a "real" PowerPC instruction.  When a TOC entry needs to be created,
> this is done here in the same manner as for the existing LDtoc,
> LDtocJTI, and LDtocCPT pseudo-instructions (I factored out a new routine
> to handle this).
> 
> I had originally thought that if a TOC entry was needed for LDtocL or
> ADDItocL, it would already have been generated for the previous
> ADDIStocHA.  However, at higher optimization levels, the ADDIStocHA may
> appear in a  different block, which may be assembled textually following
> the block containing the LDtocL or ADDItocL.  So it is necessary to
> include the possibility of creating a new TOC entry for those two
> instructions.
> 
> Note that for LDtocL, we generate a new form of LD called LDrs.  This
> allows specifying the @toc at l relocation for the offset field of the LD
> instruction.  When the peephole optimization described above is added,
> we will need to do similar things for all immediate-form load and store
> operations.
> 
> This was a long-winded description, but I hope it helps to make the
> patch more understandable.  I would very much appreciate comments and
> suggestions before I commit this!
> 
> Thanks,
> Bill
>