[llvm-commits] [PATCH, RFC] Medium code model support for 64-bit PowerPC

Thu Nov 15 08:51:49 PST 2012

Hello,

This patch implements medium code model support for 64-bit PowerPC.

The default for 64-bit PowerPC is small code model, in which TOC entries
must be addressable using a 16-bit offset from the TOC pointer.
Additionally, only TOC entries are addressed via the TOC pointer.

With medium code model, TOC entries and data sections can all be
addressed via the TOC pointer using a 32-bit offset.  Cooperation with
the linker allows 16-bit offsets to be used when these are sufficient,
reducing the number of extra instructions that need to be executed.
Medium code model also does not generate explicit TOC entries in
".section toc" for variables that are wholly internal to the compilation
unit.

Consider a load of an external 4-byte integer.  With small code model,
the compiler generates:

	ld 3, .LC1 at toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw", at progbits
.LC1:
	.tc ei[TC],ei

With medium model, it instead generates:

	addis 3, 2, .LC1 at toc@ha
	ld 3, .LC1 at toc@l(3)
	lwz 4, 0(3)

	.section	.toc,"aw", at progbits
.LC1:
	.tc ei[TC],ei

Here .LC1 at toc@ha is a relocation requesting the upper 16 bits of the
32-bit offset of ei's TOC entry from the TOC base pointer.
Similarly, .LC1 at toc@l is a relocation requesting the lower 16 bits. Note
that if the linker determines that ei's TOC entry is within a 16-bit
offset of the TOC base pointer, it will replace the "addis" with a
"nop", and replace the "ld" with the identical "ld" instruction from the
small code model example.

Consider next a load of a function-scope static integer.  For small code
model, the compiler generates:

	ld 3, .LC1 at toc(2)
	lwz 4, 0(3)

	.section	.toc,"aw", at progbits
.LC1:
	.tc test_fn_static.si[TC],test_fn_static.si
	.type	test_fn_static.si, at object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

For medium code model, the compiler generates:

	addis 3, 2, test_fn_static.si at toc@ha
	addi 3, 3, test_fn_static.si at toc@l
	lwz 4, 0(3)

	.type	test_fn_static.si, at object
	.local	test_fn_static.si
	.comm	test_fn_static.si,4,4

Again, the linker may replace the "addis" with a "nop", calculating only
a 16-bit offset when this is sufficient.

Note that it would be more efficient for the compiler to generate:

	addis 3, 2, test_fn_static.si at toc@ha
        lwz 4, test_fn_static.si at toc@l(3)

The current patch does not perform this optimization yet.  This will be
addressed as a peephole optimization in a later patch.

For the moment, the default code model for 64-bit PowerPC will remain
the small code model.  We plan to eventually change the default to
medium code model, which matches current upstream GCC behavior.  Note
that the different code models are ABI-compatible, so code compiled with
different models will be linked and execute correctly.

I've tested the regression suite and the application/benchmark test
suite in two ways:  Once with the patch as submitted here, and once with
additional logic to force medium code model as the default.  The tests
all compile cleanly, with one exception.  The mandel-2 application test
fails due to an unrelated ABI compatibility with passing complex
numbers.  It just so happens that small code model was incredibly lucky,
in that temporary values in floating-point registers held the expected
values needed by the external library routine that was called
incorrectly.  My current thought is to correct the ABI problems with
_Complex before making medium code model the default, to avoid
introducing this "regression."

Here are a few comments on how the patch works, since the selection code
can be difficult to follow:

The existing logic for small code model defines three
pseudo-instructions:  LDtoc for most uses, LDtocJTI for jump table
addresses, and LDtocCPT for constant pool addresses.  These are expanded
by SelectCodeCommon().  The pseudo-instruction approach doesn't work for
medium code model, because we need to generate two instructions when we
match the same pattern.  Instead, new logic in PPCDAGToDAGISel::Select()
intercepts the TOC_ENTRY node for medium code model, and generates an
ADDIStocHA followed by either a LDtocL or an ADDItocL.  These new node
types correspond naturally to the sequences described above.

The addis/ld sequence is generated for the following cases:
 * Jump table addresses
 * Function addresses
 * External global variables
 * Tentative definitions of global variables (common linkage)

The addis/addi sequence is generated for the following cases:
 * Constant pool entries
 * File-scope static global variables
 * Function-scope static variables

Expanding to the two-instruction sequences at select time exposes the
instructions to subsequent optimization, particularly scheduling.

The rest of the processing occurs at assembly time, in
PPCAsmPrinter::EmitInstruction.  Each of the instructions is converted
to a "real" PowerPC instruction.  When a TOC entry needs to be created,
this is done here in the same manner as for the existing LDtoc,
LDtocJTI, and LDtocCPT pseudo-instructions (I factored out a new routine
to handle this).

I had originally thought that if a TOC entry was needed for LDtocL or
ADDItocL, it would already have been generated for the previous
ADDIStocHA.  However, at higher optimization levels, the ADDIStocHA may
appear in a  different block, which may be assembled textually following
the block containing the LDtocL or ADDItocL.  So it is necessary to
include the possibility of creating a new TOC entry for those two
instructions.

Note that for LDtocL, we generate a new form of LD called LDrs.  This
allows specifying the @toc at l relocation for the offset field of the LD
instruction.  When the peephole optimization described above is added,
we will need to do similar things for all immediate-form load and store
operations.

This was a long-winded description, but I hope it helps to make the
patch more understandable.  I would very much appreciate comments and
suggestions before I commit this!

Thanks,
Bill

-- 
Bill Schmidt, Ph.D.
IBM Advance Toolchain for PowerLinux
IBM Linux Technology Center
wschmidt at linux.vnet.ibm.com
wschmidt at us.ibm.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mcm-2012-11-15.patch
Type: text/x-patch
Size: 24096 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121115/146aa978/attachment.bin>