[all-commits] [llvm/llvm-project] 840d10: [AVR] Custom lower 32-bit shift instructions

Sun Jan 8 11:06:02 PST 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 840d10a1d2c939016f387a041b0fbb3a9d592b17
      https://github.com/llvm/llvm-project/commit/840d10a1d2c939016f387a041b0fbb3a9d592b17
  Author: Ayke van Laethem <aykevanlaethem at gmail.com>
  Date:   2023-01-08 (Sun, 08 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AVR/AVRISelLowering.cpp
    M llvm/lib/Target/AVR/AVRISelLowering.h
    M llvm/lib/Target/AVR/AVRInstrInfo.td
    A llvm/test/CodeGen/AVR/shift32.ll

  Log Message:
  -----------
  [AVR] Custom lower 32-bit shift instructions

32-bit shift instructions were previously expanded using the default
SelectionDAG expander, which meant it used 16-bit constant shifts and
ORed them together. This works, but is far from optimal.

I've optimized 32-bit shifts on AVR using a custom inserter. This is
done using three new pseudo-instructions that take the upper and lower
bits of the value in two separate 16-bit registers and outputs two
16-bit registers.

This is the first commit in a series. When completed, shift instructions
will take around 31% less instructions on average for constant 32-bit
shifts, and is in all cases equal or better than the old behavior. It
also tends to match or outperform avr-gcc: the only cases where avr-gcc
does better is when it uses a loop to shift, or when the LLVM register
allocator inserts some unnecessary movs. But it even outperforms avr-gcc
in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in
TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9%
smaller. I think picolibc is a better representation of real-world code,
but even a ~1% reduction in code size is really significant.

The current patch just lays the groundwork. The result is actually a
regression in code size. Later patches will use this as a basis to
optimize these shift instructions.

Differential Revision: https://reviews.llvm.org/D140569

  Commit: 8f8afabd32092590a81e10e11e0a2c8b24e09b76
      https://github.com/llvm/llvm-project/commit/8f8afabd32092590a81e10e11e0a2c8b24e09b76
  Author: Ayke van Laethem <aykevanlaethem at gmail.com>
  Date:   2023-01-08 (Sun, 08 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AVR/AVRISelLowering.cpp
    M llvm/test/CodeGen/AVR/shift32.ll

  Log Message:
  -----------
  [AVR] Optimize 32-bit shift: move bytes around

This patch optimizes 32-bit constant shifts by renaming registers. This
is very effective as the compiler would otherwise need to do a lot of
single bit shift instructions. Instead, the registers are renamed at the
SSA level which means the register allocator will insert the necessary
mov instructions.

Unfortunately, the register allocator will insert some unnecessary movs
with the current code. This will be fixed in a later patch.

Differential Revision: https://reviews.llvm.org/D140570

  Commit: 81f5f22f27847b9adc69485aff4a36af205c0549
      https://github.com/llvm/llvm-project/commit/81f5f22f27847b9adc69485aff4a36af205c0549
  Author: Ayke van Laethem <aykevanlaethem at gmail.com>
  Date:   2023-01-08 (Sun, 08 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AVR/AVRISelLowering.cpp
    M llvm/test/CodeGen/AVR/shift32.ll

  Log Message:
  -----------
  [AVR] Optimize 32-bit shifts: shift by 4 bits

This uses a complicated shift sequence that avr-gcc also uses, but
extended to work over any number of bytes and in both directions
(logical shift left and logical shift right). Unfortunately it can't be
used for an arithmetic shift right: I've tried to come up with a
sequence but couldn't.

Differential Revision: https://reviews.llvm.org/D140571

  Commit: fad5e0cf50f119f083dfc82e08994825cae5001f
      https://github.com/llvm/llvm-project/commit/fad5e0cf50f119f083dfc82e08994825cae5001f
  Author: Ayke van Laethem <aykevanlaethem at gmail.com>
  Date:   2023-01-08 (Sun, 08 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AVR/AVRISelLowering.cpp
    M llvm/test/CodeGen/AVR/shift32.ll

  Log Message:
  -----------
  [AVR] Optimize 32-bit shifts: reverse shift + move

This optimization turns shifts of almost a multiple of 8 into a shift
into the opposite direction. Unfortunately it doesn't compose well with
the other optimizations (I've tried) so it's separate from them.

Differential Revision: https://reviews.llvm.org/D140572

  Commit: 9592920890cf7c13d5a47e54d62284e7bd1418cf
      https://github.com/llvm/llvm-project/commit/9592920890cf7c13d5a47e54d62284e7bd1418cf
  Author: Ayke van Laethem <aykevanlaethem at gmail.com>
  Date:   2023-01-08 (Sun, 08 Jan 2023)

  Changed paths:
    M llvm/lib/Target/AVR/AVRISelLowering.cpp
    M llvm/test/CodeGen/AVR/shift32.ll

  Log Message:
  -----------
  [AVR] Optimize 32-bit shifts: optimize REG_SEQUENCE

This pseudo-instruction stores two small (8-bit) registers into one wide
(16-bit) register. But apparently the order matters a lot to the
register allocator.
This patch changes the order of inserting the registers to optimize for
the best register allocation in the tests of shift32.ll. It might be
detrimental in other cases, but keeping the registers in the same
physical register seems like it would be a common case.

Differential Revision: https://reviews.llvm.org/D140573

Compare: https://github.com/llvm/llvm-project/compare/2cc30c4ee816...9592920890cf