[llvm-commits] [PATCH, RFC] Medium code model support for 64-bit PowerPC
Bill Schmidt
wschmidt at linux.vnet.ibm.com
Thu Nov 15 08:56:26 PST 2012
Heh, the anti-spam filter causes some problems with examples in my
description. Please mentally substitute "@" for " at " throughout...
Bill
On Thu, 2012-11-15 at 10:51 -0600, Bill Schmidt wrote:
> Hello,
>
> This patch implements medium code model support for 64-bit PowerPC.
>
> The default for 64-bit PowerPC is small code model, in which TOC entries
> must be addressable using a 16-bit offset from the TOC pointer.
> Additionally, only TOC entries are addressed via the TOC pointer.
>
> With medium code model, TOC entries and data sections can all be
> addressed via the TOC pointer using a 32-bit offset. Cooperation with
> the linker allows 16-bit offsets to be used when these are sufficient,
> reducing the number of extra instructions that need to be executed.
> Medium code model also does not generate explicit TOC entries in
> ".section toc" for variables that are wholly internal to the compilation
> unit.
>
> Consider a load of an external 4-byte integer. With small code model,
> the compiler generates:
>
> ld 3, .LC1 at toc(2)
> lwz 4, 0(3)
>
> .section .toc,"aw", at progbits
> .LC1:
> .tc ei[TC],ei
>
> With medium model, it instead generates:
>
> addis 3, 2, .LC1 at toc@ha
> ld 3, .LC1 at toc@l(3)
> lwz 4, 0(3)
>
> .section .toc,"aw", at progbits
> .LC1:
> .tc ei[TC],ei
>
> Here .LC1 at toc@ha is a relocation requesting the upper 16 bits of the
> 32-bit offset of ei's TOC entry from the TOC base pointer.
> Similarly, .LC1 at toc@l is a relocation requesting the lower 16 bits. Note
> that if the linker determines that ei's TOC entry is within a 16-bit
> offset of the TOC base pointer, it will replace the "addis" with a
> "nop", and replace the "ld" with the identical "ld" instruction from the
> small code model example.
>
> Consider next a load of a function-scope static integer. For small code
> model, the compiler generates:
>
> ld 3, .LC1 at toc(2)
> lwz 4, 0(3)
>
> .section .toc,"aw", at progbits
> .LC1:
> .tc test_fn_static.si[TC],test_fn_static.si
> .type test_fn_static.si, at object
> .local test_fn_static.si
> .comm test_fn_static.si,4,4
>
> For medium code model, the compiler generates:
>
> addis 3, 2, test_fn_static.si at toc@ha
> addi 3, 3, test_fn_static.si at toc@l
> lwz 4, 0(3)
>
> .type test_fn_static.si, at object
> .local test_fn_static.si
> .comm test_fn_static.si,4,4
>
> Again, the linker may replace the "addis" with a "nop", calculating only
> a 16-bit offset when this is sufficient.
>
> Note that it would be more efficient for the compiler to generate:
>
> addis 3, 2, test_fn_static.si at toc@ha
> lwz 4, test_fn_static.si at toc@l(3)
>
> The current patch does not perform this optimization yet. This will be
> addressed as a peephole optimization in a later patch.
>
> For the moment, the default code model for 64-bit PowerPC will remain
> the small code model. We plan to eventually change the default to
> medium code model, which matches current upstream GCC behavior. Note
> that the different code models are ABI-compatible, so code compiled with
> different models will be linked and execute correctly.
>
> I've tested the regression suite and the application/benchmark test
> suite in two ways: Once with the patch as submitted here, and once with
> additional logic to force medium code model as the default. The tests
> all compile cleanly, with one exception. The mandel-2 application test
> fails due to an unrelated ABI compatibility with passing complex
> numbers. It just so happens that small code model was incredibly lucky,
> in that temporary values in floating-point registers held the expected
> values needed by the external library routine that was called
> incorrectly. My current thought is to correct the ABI problems with
> _Complex before making medium code model the default, to avoid
> introducing this "regression."
>
> Here are a few comments on how the patch works, since the selection code
> can be difficult to follow:
>
> The existing logic for small code model defines three
> pseudo-instructions: LDtoc for most uses, LDtocJTI for jump table
> addresses, and LDtocCPT for constant pool addresses. These are expanded
> by SelectCodeCommon(). The pseudo-instruction approach doesn't work for
> medium code model, because we need to generate two instructions when we
> match the same pattern. Instead, new logic in PPCDAGToDAGISel::Select()
> intercepts the TOC_ENTRY node for medium code model, and generates an
> ADDIStocHA followed by either a LDtocL or an ADDItocL. These new node
> types correspond naturally to the sequences described above.
>
> The addis/ld sequence is generated for the following cases:
> * Jump table addresses
> * Function addresses
> * External global variables
> * Tentative definitions of global variables (common linkage)
>
> The addis/addi sequence is generated for the following cases:
> * Constant pool entries
> * File-scope static global variables
> * Function-scope static variables
>
> Expanding to the two-instruction sequences at select time exposes the
> instructions to subsequent optimization, particularly scheduling.
>
> The rest of the processing occurs at assembly time, in
> PPCAsmPrinter::EmitInstruction. Each of the instructions is converted
> to a "real" PowerPC instruction. When a TOC entry needs to be created,
> this is done here in the same manner as for the existing LDtoc,
> LDtocJTI, and LDtocCPT pseudo-instructions (I factored out a new routine
> to handle this).
>
> I had originally thought that if a TOC entry was needed for LDtocL or
> ADDItocL, it would already have been generated for the previous
> ADDIStocHA. However, at higher optimization levels, the ADDIStocHA may
> appear in a different block, which may be assembled textually following
> the block containing the LDtocL or ADDItocL. So it is necessary to
> include the possibility of creating a new TOC entry for those two
> instructions.
>
> Note that for LDtocL, we generate a new form of LD called LDrs. This
> allows specifying the @toc at l relocation for the offset field of the LD
> instruction. When the peephole optimization described above is added,
> we will need to do similar things for all immediate-form load and store
> operations.
>
> This was a long-winded description, but I hope it helps to make the
> patch more understandable. I would very much appreciate comments and
> suggestions before I commit this!
>
> Thanks,
> Bill
>
More information about the llvm-commits
mailing list