[llvm-dev] RFC: Insertion of nops for performance stability
Stephen Checkoway via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 17 08:50:47 PST 2016
Hi Omer,
> On Nov 17, 2016, at 03:55, Paparo Bivas, Omer via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> The two last || clauses of the if will be translated to two conditional jumps to the same target. The generated code will look like so:
>
> foo:
> 0: 48 8b 44 24 28 movq 40(%rsp), %rax
> 5: 8b 00 movl (%rax), %eax
> 7: 01 c8 addl %ecx, %eax
> 9: 44 39 c0 cmpl %r8d, %eax
> c: 75 0f jne 15 <foo+0x1D>
> e: ff 05 00 00 00 00 incl (%rip)
> 14: ff 05 00 00 00 00 incl (%rip)
> 1a: 31 c0 xorl %eax, %eax
> 1c: c3 retq
> 1d: 44 39 c9 cmpl %r9d, %ecx
> 20: 74 ec je -20 <foo+0xE>
> 22: 48 8b 44 24 30 movq 48(%rsp), %rax
> 27: 2b 08 subl (%rax), %ecx
> 29: 39 d1 cmpl %edx, %ecx
> 2b: 7f e1 jg -31 <foo+0xE>
> 2d: 31 c0 xorl %eax, %eax
> 2f: c3 retq
>
> Note: the first if clause jump is the jne instruction at 0x0C, the second if clause jump is the jg instruction at 0x2B and the third if clause jump is the je instruction at 0x20. Also note that the jg and je share a 16 byte window, which is exactly the situation we wish to avoid (consider the case in which foo is called from inside a loop. This will cause performance penalty).
Rather than inserting a nop, would it be better to change the instruction encoding to use a different form? The JE at offset 0x20 could use the JE rel32 (0f 84 0f 00 00 00) form. Similarly, the MOV at offset 0x22 could use MOV r64, rm/64 with a 32-bit offset (48 8b 84 24 30 00 00 00). The latter adds 3 bytes which is insufficient in this case, but the former adds the required 4 bytes.
I have no idea if it's better to insert extra instructions rather than increase the length of existing ones, but my intuition is that it's better to decode and retire fewer instructions. I'd assume the same is true when trying to align basic blocks.
--
Stephen Checkoway
More information about the llvm-dev
mailing list