[PATCH] [X86] New pass that moves immediate operands to registers.

Wed Oct 1 10:21:57 PDT 2014

Hi Serge,

> On Sep 30, 2014, at 12:04 PM, Serge Pavlov <sepavloff at gmail.com> wrote:
> 
> Hi Pete, Quentin,
> 
> In the specific example here for the constant 0, i’d have expected to see xor used to create the 0.  Any idea why that didn’t happen?  That would improve the code size which Quentin mentioned is a plus for this.
> 
> This pass can be modified to load zero in this way. Probably it is better to improve instruction selector properly as loading zero in such way might be useful not only for caching immediates in register.
> 
> 
> Also, i’ve sometimes seen neighboring 0’s being written of different sizes.  Would it be possible (perhaps in a future patch), to turn
> 
> movq 0, 0x0(%rsi)
> movl 0, 0x8(%rsi)
> 
> in to
> 
> xor %rax, %rax
> movq %rax, 0x0(%rsi)
> movl %eax, 0x8(%rsi)
> 
> 
> Yes, subregisters are taken into account. For instance instructions:
> 
> movw 0x5555, 0x0(%rsi)
> movw 0x5555, 0x4(%rsi)
> movb 0x55, 0x8(%rsi)
> 
> are transformed into:
> 
> movw 0x5555, %ax
> movw %ax, 0x0(%rsi)
> movw %ax, 0x4(%rsi)
> movb %al, 0x8(%rsi)
> 
>> The constant hoisting pass does this kind of things. Should we try to teach it to handle this kind of cases?
> 
> That would be interesting. However this pass is x86 specific and can use processor features (subregister structure, loading 64-bit value with 32-bit move). Can theses features be used by constant hoisting? 

Maybe. This pass has a bunch of target hooks if I remember correctly. Juergen would know better :).

>> 
>> 
>> Moreover, this may be beneficial for code size, but I guess it is generally not beneficial for performances. Therefore, I believe this should be done for functions with the Os or Oz attributes only.
> 
> Just curious, why? Moves from register must be faster than move from memory.

Yes, but those are moves from immediate, which does not require memory at all.
My performance concerns are:
- Register pressure, like Rafael mentioned.
- Additional scheduling dependencies.

Going back to your example:
This yields two independent chain of computation that can be scheduled independently. Moreover, you need just one register to realize this sequence.
  mov $0, 0x4(%esi)
   mov $0, 0x8(%esi)

The two sequences of computations have now to wait for the first mov immediate. Moreover, this sequence requires 2 registers.
   mov $0, %eax
   mov %eax, 0x4(%esi)
   mov %eax, 0x8(%esi)

> Both gcc and icc use moves from register when compiling with optimization.  

Sure. What I am saying is that generally speaking, trading an immediate to register copy against a register to register copy does not sound like beneficial to me. 

Except from code size improvements, what kind of improvements are you seeing?
Also, how big are those improvements?

Thanks,
-Quentin
> 
> --Serge 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141001/c59a299f/attachment.html>