[PATCH] D51780: ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.

Fri Sep 7 05:27:57 PDT 2018

t.p.northover created this revision.
Herald added subscribers: chrib, hiraditya, kristof.beyls, mcrosier.
Herald added a reviewer: javed.absar.

The Technical Reference Manuals for these two CPUs state that branching to an unaligned 32-bit instruction incurs an extra pipeline reload penalty. That's bad.

I also enable the optimization at -Os for just these two CPUs. My impression has been that it's a bit of a gamble with the bigger cores, and it also wastes more space. But for these two we're getting 1 cycle per iteration in return for 1 byte per loop (on average); that seems like it definitely fits into LLVM's quirky definition of -Os.

I'm open to extending it to other processors, but my research indicates Cortex-M0 is too simple to benefit (it claims conditional branches are always 3 cycles if taken), and Cortex-M7 has no performance documentation.

Repository:
  rL LLVM

https://reviews.llvm.org/D51780

Files:
  llvm/include/llvm/CodeGen/TargetLowering.h
  llvm/lib/CodeGen/MachineBlockPlacement.cpp
  llvm/lib/Target/ARM/ARM.td
  llvm/lib/Target/ARM/ARMISelLowering.cpp
  llvm/lib/Target/ARM/ARMISelLowering.h
  llvm/lib/Target/ARM/ARMSubtarget.cpp
  llvm/lib/Target/ARM/ARMSubtarget.h
  llvm/test/CodeGen/ARM/loop-align-cortex-m.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D51780.164392.patch
Type: text/x-patch
Size: 6336 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180907/52d0c89e/attachment.bin>