[PATCH] [X86] New pass that moves immediate operands to registers.

Mon Oct 6 09:19:56 PDT 2014

Hi Serge,

> On Oct 6, 2014, at 4:18 AM, Serge Pavlov <sepavloff at gmail.com> wrote:
> 
> Hi Quentin,
> 
> Thank you very much for your detailed review.
> 
> 2014-10-04 4:34 GMT+07:00 Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>>:
> Hi Serge,
> 
> A couple of general comments:
> - None of the comment uses the doxygen style: /// instead of //. Please fix it.
> - Comments start with a capital letter and end with a period (or any appropriate punctuation :)).
> - As I said in one of my email, we should do that just for Os and Oz functions (i.e., look for function attribute: OptimizeForSize and MinSize).
> 
> OK, will fix these.
>  
> 
> The overall approach seems more complicated than I would expect for such a pass. In particular, I think that all the InReg, Scheduled, and Waiting states could be avoided, as well as all the history thing with the expiration date.
> 
> History was introduced to track only "fresh" immediate values. A value that is not used for a long time would occupy memory and would increase search time, with long basic block these might be substantial. On the other hand, only values used in close vicinity of the current instruction should be taken into account. History solves the problem of maintaining compact and actual pool of values. It does not require memory allocation and has predictable search time. This pass was made as a kind of digital filter that processes instruction flow, terms 'history' and 'time' were borrowed from there. Using value history is attractive due to guaranty of limited expenses in memory and execution time.
> 
> As for InReg, Scheduled and Waiting states, they exist to postpone decision, what register class is to be used to store immediate. If we see that some 32 bit value is profitable to materialize but then see use of 64 bit values, lower half of which is that 32 bit value, we could use part of 64-bit value, if we inserted appropriate load instruction. It looks like this code indeed may be simplified.
>  
> My understanding is that you try to limit the impact of this transformation on the register pressure. That seems fragile to me (there are a lot of magic numbers involved here for the different limits) and harder than it should be to understand.
> 
> That I would recommend is having more faith in the register allocator. The live-ranges, you are creating/extending are rematerializable. Therefore, if the register pressure becomes too high, the register allocator can theoritically rematerialize the values instead of spilling.
> The main problem then, is that you may end up with code like this:
> reg = mov imm
> reg2 = op reg3, reg
> instead of
> reg2 = op reg3, imm
> 
> That said, we could probably teach the register allocator to “fold” the operand as we do for memory load, so I wouldn’t worry too much about that.
> 
> Yes, this is good idea, to teach register allocator about spilling registers loaded from immediates. If register allocator could handle such register enough flexibly, it would minimize impact of this pass on register pressure and some magic constants in this pass could be avoided.

The register allocator should rematerialized when it can instead of spilling. The folding could be relatively straight forward to implement as we already have some target hook for the memory folding.

>  
> 
> The bottom line is that I would expect the pass to be a relatively simple scan that creates appropriate constants when the whole basic block has been traversed.
> I.e., something like:
> ConstantsUsage // Sort of dictionary. Map a constant to its uses.
> for each instr in bb
>   if instr can use register variant
>     record instr, instr.imm in ConstantsUsage // This is could be the trick part. See below for some thoughts.
> 
> for each constant in ConstantsUsage
>   if constant is profitable
>     materialize constant // This creates one constant and update all its uses using the appropriate subreg, removing the redundant load imm, etc.
> 
> 
> In this approach we have full information about immediate use at the end of basic block, this simplifies decision about materialization. The only drawback is higher resource consumption, both memory (need to keep all values in BB) and execution time (make two passes instead one).

On the other hand, since we do not have to handle all the stuff about materializing an immediate and updating the history, this is not clear to me that this would be an actual loss. Beside, we could virtually “split” a BB after X instructions if we want to limit the memory usage. Anyway, I believe most BBs have a reasonable size, so I believe the difference will remain marginal, but the code simpler. Of course, I can be wrong and being able to evaluate would be best :).
My second main concern (after code complexity), is the fact that we will miss some opportunities when we reach the size of the history.

> Maybe enhancements in register allocator can alleviate this problem. For instance, all values that could be profitable are loaded into virtual registers and the register allocator decides which values should be converted into immediates, it anyway use heuristics to map physical registers.

Currently, I think the opposite would be supported, i.e., create a big live range for each immediate, then the allocator would rematerialize if it needs to spill.

> 
> Regarding ‘record instr, imm, ConstantsUsage’.
> The hard part, per say, is to enable reuse of wide constant, e.g., 0x0fff is used for 0xff.
> 
> It is expected in memset expansion. In other cases probably this is rare case.
>  
> I believe that you wouldn’t have that many constants to look through per basic block and a linear search may be sufficient.
> Now, if we want something faster, we could do some trick like having a mapping per size, sorted by profitability, etc.
> 
> What do you think?
> 
> For me using history is attractive due to predictable resource consumption, but probably in most cases a map is enough, this needs investigation. 
>  
> 
> BTW, you could also have some function level scope if you really are interested at saving as much size as possible.
> 
> 
> Looks like a job for RegisterCoalescer?

Could be related to the register allocator, yes :).
If an immediate is represented by a big live range, then materializing the constants becomes a spill placement and indeed, this can be handled by the register allocator.

Thanks for all the work on this. And thanks for checking the different approaches, I know this is a lot of work.

At some point, if you want to evaluate something more broadly, we can help with the measurements.

Cheers,
-Quentin
>  
> Cheers,
> -Quentin
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:10
> @@ +9,3 @@
> +//
> +// This file defines the pass which will move immediate operand into a resister
> +// if it is used in several adjacent instructions. This transformation tries to
> ----------------
> s/resister/register
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:32
> @@ +31,3 @@
> +
> +#define DEBUG_TYPE "x86-imm2reg"
> +#include "X86.h"
> ----------------
> I believe this should come after the includes. In other words, usually right after using namespace llvm.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:50
> @@ +49,3 @@
> +// may be set smaller by supplying proper command line option.
> +const unsigned MaxHistoryDepth = 8;
> +
> ----------------
> I believe most users won’t mess with the related option and I rather have the number I asked on the command line than a hidden maximal value.
> The bottom line is, please do not use a hard-coded maximal value here.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:54
> @@ +53,3 @@
> +// may be set smaller by supplying proper command line option.
> +const unsigned MaxHistoryWidth = 4;
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:58
> @@ +57,3 @@
> +// immediate value before it is dropped from history.
> +const unsigned DefMaxSeparatingInstrs = MaxHistoryWidth;
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:62
> @@ +61,3 @@
> +// values.
> +const unsigned DefMaxRegisters = 2;
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:66
> @@ +65,3 @@
> +// register.
> +const unsigned DefMinProfit = 1;
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:72
> @@ +71,3 @@
> +    cl::desc("Max number of immediate values tracked in history."),
> +    cl::init(MaxHistoryWidth), cl::Hidden);
> +
> ----------------
> Init this with the maximal value.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:79
> @@ +78,3 @@
> +    cl::desc("Max number of uses stored for a tracked immediate value."),
> +    cl::init(MaxHistoryDepth), cl::Hidden);
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:88
> @@ +87,3 @@
> +    cl::desc("Max number of instructions separating two immediate uses."),
> +    cl::init(DefMaxSeparatingInstrs), cl::Hidden);
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:93
> @@ +92,3 @@
> +                 cl::desc("Max number of registers used for materialization."),
> +                 cl::init(DefMaxRegisters), cl::Hidden);
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:98
> @@ +97,3 @@
> +    cl::desc("Minimal number of bytes that materialization must save."),
> +    cl::init(DefMinProfit), cl::Hidden);
> +
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:112
> @@ +111,3 @@
> +  // Must be 1 << DATA_N == size in bytes
> +  DATA_8,
> +  DATA_16,
> ----------------
> I would explicitly set the value, to be sure anyone updating this code matches the intent.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:130
> @@ +129,3 @@
> +  int Opcode;
> +  int NewOpCode;      // Opcode if imm is replaced by reg
> +  ImmediateSize Size; // of immediate data
> ----------------
> Period at the end of the comments.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:134
> @@ +133,3 @@
> +  int Profit;         // Gain in bytes if imm is changed to reg
> +  bool IsLoad;        // is this an instruction like 'mov imm, reg'?
> +};
> ----------------
> IsLoadImm maybe?
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:135
> @@ +134,3 @@
> +  bool IsLoad;        // is this an instruction like 'mov imm, reg'?
> +};
> +
> ----------------
> For this patch I think it is fine to have all the information written here, but in the future it would be nice if it could be part of tablegen.
> I do not like having yet another place to update when I add instructions.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:241
> @@ +240,3 @@
> +
> +static int GetSizeOfMovImmToRegInstr(int OpCode) {
> +  if (OpCode == X86::MOV16ri)
> ----------------
> Could you add a comment on that function?
> I do not understand its purpose with just its name.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:243
> @@ +242,3 @@
> +  if (OpCode == X86::MOV16ri)
> +    return 4; // 66 B8 iw
> +  if (OpCode == X86::MOV32ri)
> ----------------
> What do these comments mean?
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:253
> @@ +252,3 @@
> +
> +static unsigned GetSubreg(ImmediateSize Entire, ImmediateSize Part) {
> +  if (Entire == Part)
> ----------------
> Shouldn’t we assert Entire is bigger than Part?
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:269
> @@ +268,3 @@
> +static unsigned GetSubregByBytes(unsigned Entire, unsigned Part) {
> +  if (Entire == Part)
> +    return X86::NoSubRegister;
> ----------------
> Ditto.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:795
> @@ +794,3 @@
> +  InstrInfo *Rec = OpCodeTable::Find(Instr.getOpcode());
> +  if (Rec == 0)
> +    return false;
> ----------------
> nullptr instead of 0.
> 
> ================
> Comment at: lib/Target/X86/X86MaterializeImmediates.cpp:947
> @@ +946,3 @@
> +
> +  // If use set of the value contains load instruction, expences for load
> +  // instruction can be decreased.
> ----------------
> expences -> expenses.
> 
> http://reviews.llvm.org/D5544 <http://reviews.llvm.org/D5544>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141006/cc3e62b5/attachment.html>