[PATCH] D138529: [AVR] Optimize constant 32-bit shifts

Tue Nov 22 15:50:31 PST 2022

aykevl created this revision.
aykevl added reviewers: dylanmckay, benshi001.
Herald added subscribers: Jim, hiraditya.
Herald added a project: All.
aykevl requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

32-bit shift instructions were previously expanded using the default SelectionDAG expander, which meant it used 16-bit constant shifts and ORed them together. This works, but is far from optimal.

The patch here uses a custom instruction inserter to insert optimized constant 32-bit shifts. This is done using three new pseudo-instructions that take the upper and lower bits of the value in two separate 16-bit registers and outputs two 16-bit registers.

This change results in around 31% less instructions on average for constant 32-bit shifts, and is in all cases equal or better than the old behavior. It also tends to match or outperform avr-gcc: the only cases where avr-gcc does better is when it uses a loop to shift, or when the LLVM register allocator inserts some unnecessary movs. But it even outperforms avr-gcc in some cases where avr-gcc does not use a loop.

As a side effect, non-constant 32-bit shifts also become more efficient.

For some real-world differences: the build of compiler-rt I use in TinyGo becomes 2.7% smaller and the build of picolibc I use becomes 0.9% smaller. I think picolibc is a better representation of real-world code, but even a ~1% reduction in code size is really significant.

Future patches can improve on this:

- A loop can often result in reduced code size at the expense of speed. We should use a loop when the minsize attribute is set and a loop would need fewer instructions.
- For some reason the register allocator inserts some unnecessary moves. I think this can be worked around by fiddling with REG_SEQUENCE, but I'm not sure. A better solution might be to try and split 16-bit instructions into 8-bit instructions before register allocation (something which I've tried a few times but  haven't found a good solution for yet).
- Because the main algorithm is independent of the number of registers, the same code can be used to emit moves for other shift  widths (8, 16, 64, and others if needed). But this can be done in later patches if needed.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D138529

Files:
  llvm/lib/Target/AVR/AVRISelLowering.cpp
  llvm/lib/Target/AVR/AVRISelLowering.h
  llvm/lib/Target/AVR/AVRInstrInfo.td
  llvm/test/CodeGen/AVR/shift32.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D138529.477331.patch
Type: text/x-patch
Size: 30239 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20221122/6717a4f5/attachment.bin>