[PATCH] Hexagon Register Cleanup
Krzysztof Parzyszek
kparzysz at codeaurora.org
Mon May 13 09:07:21 PDT 2013
Hi,
This is the Hexagon pass that was meant to address the complications
related to implicit uses and defs of super- and sub-registers.
To clarify the situation for everybody:
Hexagon has 32 registers R0..R31, and each is 32-bits. Certain
instructions can do 64-bit calculations, and their operands are 64-bit
register pairs (even-odd). These pairs are usually written as D0..D15,
but there are in fact pairs R1:0, R3:2, R5:4, etc. (Hexagon is
little-endian, hence the "reversed" notation). It is not unusual to
have the individual registers in a register pair be defined separately,
and then used as a pair in another instruction, for example:
R0 = ...
R1 = ...
... = D0
This introduces certain complications with the current register
allocation. The problem is that the register rewriter will add implicit
uses and implicit defs of super-registers when a sub-register is used or
defined. For example:
%vreg1:subreg_loreg = COPY %vreg2:subreg_loreg
%vreg1:subreg_hireg = COPY %vreg2:subreg_hireg
assuming that vreg1 becomes D0, and vreg2 becomes D1, would become
%R0<def> = COPY %R2<use>, %D0<imp-def>, %D1<imp-use>
%R1<def> = COPY %R3<use>, %D0<imp-def>, %D1<imp-use>
Hexagon is a VLIW machine, i.e. instructions are grouped into packets,
and then the packets are executed as a unit (i.e. all instructions
within a packet are executed in parallel, subject to certain
limitations). For performance it is much better to pack as many
instructions in a packet as possible (architecture limit is 4), instead
of having more packets with fewer instructions.
One restriction is that there cannot be any dependencies between
instructions in a packet, so for the example above, the packetizer would
be unable to put the two COPY instructions in the same packet, even
though, from the architecture point of view, there are no dependencies
and they can execute in parallel. The reason for that would be that D0
appears to be defined in both instructions (hence they cannot be
parallelized).
This pass tries to solve this problem (and related issues) by shifting
the liveness tracking from super-registers to sub-registers. It does so
by marking all explicit uses and defs of register pairs as "undef", and
adds implicit uses and defs of the 32-bit components. In addition to
that, it removes the "extra" implicit uses and defs of super-registers
(i.e. register pairs) that were added by the rewriter. So, the above
example would become
%R0<def> = COPY %R2<use>
%R1<def> = COPY %R3<use>
If we had an instruction that actually uses register pairs, such as
%D0<def> = ADD64_rr %D1<use>, %D2<use>
it would be processed to look like this:
%D0<def,undef> = ADD64_rr %D1<use,undef>, %D2<use,undef>,
%R0<imp-def>, %R1<imp-def> // D0 = ...
%R2<imp-use>, %R3<imp-use> // ... = D1
%R4<imp-use>, %R5<imp-use> // ... = D2
The intent here is to mark the pairs as "undef" and thus remove them
from dependence analysis. The little problem here was that dependence
analysis still considered those registers, hence if this transformation
is enabled, it also forces ignoring of "undef" registers in the
dependence analysis. This is done using debug flags so that other
targets are unaffected.
Since after this transformation, a former anti-dependence on a single
register (register pair) now becomes an anti-dependence on two 32-bit
registers, the existing anti-dependence breaking algorithm will no
longer work in such cases. The problem is that both sub-registers would
need to be rewritten in such a way, as to remain in a "pair"
relationship, e.g. R1:0 could become R5:4, but not just some two random
32-bit registers. To address this problem, there is an
"anti-dependence" part in the HRC pass.
The whole transformation is divided into 3 stages:
1. "Finalize RA", where corrective actions are taken to address some
undesirable outputs from the rewriter (see below).
2. "Anti-dep HRC", where the bulk of the work happens, i.e. putting the
"undef" flag, and rewriting anti-dependencies on register pairs.
3. "Finalize", where the hijacking of "undef" ends, and the explicit
register pairs become "legitimate def/use" again.
Issues with the rewritter mentioned above are that it will spill an
entire 64-bit register, even when only a part of it was explicitly
defined. Normally, the whole 64-bit register would be "implicitly
defined", as per the usual rewritter treatment, but since we are trying
to track the sub-registers, we may end up with a store of R1:0, where
only R0 was actually defined. To address this, we simply add a
definition of R1 to "complete" the definition of R1:0, so that it can be
spilled as a whole. Here's a bit on inefficiency injected, since we
actually add an extra instruction, but overall this is still profitable
for us.
This pass is written to be transparent to any other targets. The only
globally-visible change would be printing of the "undef" flag on
MachineInstr operands. The ignoring of the "undef" registers in
dependence analysis should only happen on Hexagon, and only when HRC is
enabled.
Please let me know if you have any comments.
Thanks,
-Krzysztof
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Hexagon-Register-Cleanup.patch
Type: text/x-patch
Size: 78972 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130513/fc50e793/attachment.bin>
More information about the llvm-commits
mailing list