[PATCH] [X86] New pass that moves immediate operands to registers.

Serge Pavlov sepavloff at gmail.com
Mon Oct 6 04:18:35 PDT 2014


Hi Quentin,

Thank you very much for your detailed review.

2014-10-04 4:34 GMT+07:00 Quentin Colombet <qcolombet at apple.com>:

> Hi Serge,
>
> A couple of general comments:
> - None of the comment uses the doxygen style: /// instead of //. Please
> fix it.
> - Comments start with a capital letter and end with a period (or any
> appropriate punctuation :)).
> - As I said in one of my email, we should do that just for Os and Oz
> functions (i.e., look for function attribute: OptimizeForSize and MinSize).
>

OK, will fix these.


>
> The overall approach seems more complicated than I would expect for such a
> pass. In particular, I think that all the InReg, Scheduled, and Waiting
> states could be avoided, as well as all the history thing with the
> expiration date.
>

History was introduced to track only "fresh" immediate values. A value that
is not used for a long time would occupy memory and would increase search
time, with long basic block these might be substantial. On the other hand,
only values used in close vicinity of the current instruction should be
taken into account. History solves the problem of maintaining compact and
actual pool of values. It does not require memory allocation and has
predictable search time. This pass was made as a kind of digital filter
that processes instruction flow, terms 'history' and 'time' were borrowed
from there. Using value history is attractive due to guaranty of limited
expenses in memory and execution time.

As for InReg, Scheduled and Waiting states, they exist to postpone
decision, what register class is to be used to store immediate. If we see
that some 32 bit value is profitable to materialize but then see use of 64
bit values, lower half of which is that 32 bit value, we could use part of
64-bit value, if we inserted appropriate load instruction. It looks like
this code indeed may be simplified.


> My understanding is that you try to limit the impact of this
> transformation on the register pressure. That seems fragile to me (there
> are a lot of magic numbers involved here for the different limits) and
> harder than it should be to understand.
>
> That I would recommend is having more faith in the register allocator. The
> live-ranges, you are creating/extending are rematerializable. Therefore, if
> the register pressure becomes too high, the register allocator can
> theoritically rematerialize the values instead of spilling.
> The main problem then, is that you may end up with code like this:
> reg = mov imm
> reg2 = op reg3, reg
> instead of
> reg2 = op reg3, imm
>
> That said, we could probably teach the register allocator to “fold” the
> operand as we do for memory load, so I wouldn’t worry too much about that.
>

Yes, this is good idea, to teach register allocator about spilling
registers loaded from immediates. If register allocator could handle such
register enough flexibly, it would minimize impact of this pass on register
pressure and some magic constants in this pass could be avoided.


>
> The bottom line is that I would expect the pass to be a relatively simple
> scan that creates appropriate constants when the whole basic block has been
> traversed.
> I.e., something like:
> ConstantsUsage // Sort of dictionary. Map a constant to its uses.
> for each instr in bb
>   if instr can use register variant
>     record instr, instr.imm in ConstantsUsage // This is could be the
> trick part. See below for some thoughts.
>
> for each constant in ConstantsUsage
>   if constant is profitable
>     materialize constant // This creates one constant and update all its
> uses using the appropriate subreg, removing the redundant load imm, etc.
>
>
In this approach we have full information about immediate use at the end of
basic block, this simplifies decision about materialization. The only
drawback is higher resource consumption, both memory (need to keep all
values in BB) and execution time (make two passes instead one). Maybe
enhancements in register allocator can alleviate this problem. For
instance, all values that could be profitable are loaded into virtual
registers and the register allocator decides which values should be
converted into immediates, it anyway use heuristics to map physical
registers.

Regarding ‘record instr, imm, ConstantsUsage’.
> The hard part, per say, is to enable reuse of wide constant, e.g., 0x0fff
> is used for 0xff.


It is expected in memset expansion. In other cases probably this is rare
case.


> I believe that you wouldn’t have that many constants to look through per
> basic block and a linear search may be sufficient.
> Now, if we want something faster, we could do some trick like having a
> mapping per size, sorted by profitability, etc.
>
> What do you think?
>

For me using history is attractive due to predictable resource consumption,
but probably in most cases a map is enough, this needs investigation.


>
> BTW, you could also have some function level scope if you really are
> interested at saving as much size as possible.
>
>
Looks like a job for RegisterCoalescer?


> Cheers,
> -Quentin
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:10
> @@ +9,3 @@
> +//
> +// This file defines the pass which will move immediate operand into a
> resister
> +// if it is used in several adjacent instructions. This transformation
> tries to
> ----------------
> s/resister/register
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:32
> @@ +31,3 @@
> +
> +#define DEBUG_TYPE "x86-imm2reg"
> +#include "X86.h"
> ----------------
> I believe this should come after the includes. In other words, usually
> right after using namespace llvm.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:50
> @@ +49,3 @@
> +// may be set smaller by supplying proper command line option.
> +const unsigned MaxHistoryDepth = 8;
> +
> ----------------
> I believe most users won’t mess with the related option and I rather have
> the number I asked on the command line than a hidden maximal value.
> The bottom line is, please do not use a hard-coded maximal value here.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:54
> @@ +53,3 @@
> +// may be set smaller by supplying proper command line option.
> +const unsigned MaxHistoryWidth = 4;
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:58
> @@ +57,3 @@
> +// immediate value before it is dropped from history.
> +const unsigned DefMaxSeparatingInstrs = MaxHistoryWidth;
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:62
> @@ +61,3 @@
> +// values.
> +const unsigned DefMaxRegisters = 2;
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:66
> @@ +65,3 @@
> +// register.
> +const unsigned DefMinProfit = 1;
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:72
> @@ +71,3 @@
> +    cl::desc("Max number of immediate values tracked in history."),
> +    cl::init(MaxHistoryWidth), cl::Hidden);
> +
> ----------------
> Init this with the maximal value.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:79
> @@ +78,3 @@
> +    cl::desc("Max number of uses stored for a tracked immediate value."),
> +    cl::init(MaxHistoryDepth), cl::Hidden);
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:88
> @@ +87,3 @@
> +    cl::desc("Max number of instructions separating two immediate uses."),
> +    cl::init(DefMaxSeparatingInstrs), cl::Hidden);
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:93
> @@ +92,3 @@
> +                 cl::desc("Max number of registers used for
> materialization."),
> +                 cl::init(DefMaxRegisters), cl::Hidden);
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:98
> @@ +97,3 @@
> +    cl::desc("Minimal number of bytes that materialization must save."),
> +    cl::init(DefMinProfit), cl::Hidden);
> +
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:112
> @@ +111,3 @@
> +  // Must be 1 << DATA_N == size in bytes
> +  DATA_8,
> +  DATA_16,
> ----------------
> I would explicitly set the value, to be sure anyone updating this code
> matches the intent.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:130
> @@ +129,3 @@
> +  int Opcode;
> +  int NewOpCode;      // Opcode if imm is replaced by reg
> +  ImmediateSize Size; // of immediate data
> ----------------
> Period at the end of the comments.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:134
> @@ +133,3 @@
> +  int Profit;         // Gain in bytes if imm is changed to reg
> +  bool IsLoad;        // is this an instruction like 'mov imm, reg'?
> +};
> ----------------
> IsLoadImm maybe?
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:135
> @@ +134,3 @@
> +  bool IsLoad;        // is this an instruction like 'mov imm, reg'?
> +};
> +
> ----------------
> For this patch I think it is fine to have all the information written
> here, but in the future it would be nice if it could be part of tablegen.
> I do not like having yet another place to update when I add instructions.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:241
> @@ +240,3 @@
> +
> +static int GetSizeOfMovImmToRegInstr(int OpCode) {
> +  if (OpCode == X86::MOV16ri)
> ----------------
> Could you add a comment on that function?
> I do not understand its purpose with just its name.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:243
> @@ +242,3 @@
> +  if (OpCode == X86::MOV16ri)
> +    return 4; // 66 B8 iw
> +  if (OpCode == X86::MOV32ri)
> ----------------
> What do these comments mean?
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:253
> @@ +252,3 @@
> +
> +static unsigned GetSubreg(ImmediateSize Entire, ImmediateSize Part) {
> +  if (Entire == Part)
> ----------------
> Shouldn’t we assert Entire is bigger than Part?
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:269
> @@ +268,3 @@
> +static unsigned GetSubregByBytes(unsigned Entire, unsigned Part) {
> +  if (Entire == Part)
> +    return X86::NoSubRegister;
> ----------------
> Ditto.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:795
> @@ +794,3 @@
> +  InstrInfo *Rec = OpCodeTable::Find(Instr.getOpcode());
> +  if (Rec == 0)
> +    return false;
> ----------------
> nullptr instead of 0.
>
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:947
> @@ +946,3 @@
> +
> +  // If use set of the value contains load instruction, expences for load
> +  // instruction can be decreased.
> ----------------
> expences -> expenses.
>
> http://reviews.llvm.org/D5544
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141006/efca1243/attachment.html>


More information about the llvm-commits mailing list