[llvm-dev] Wide load/store optimization question

Tue Jun 13 08:44:03 PDT 2017

Hi,

I'm trying to write an LLVM backend for Epiphany arch, and I wonder if
someone can give me some advice on how to implement load/store
optimization. The CPU itself is 32-bit, but it supports wider 64-bit loads
and store. So the basic idea is to make use of those by combining narrow
ones.

I've checked how it is done in AArch64 and Hexagon, and my current code is
very close to the AArch64 one (used it as a kick-off). The problem lies in
constraints imposed by the platform.

The main constraint is that regs used should be sequential, lower reg
should be even/zero. And obviously frame offsets should be sequential to be
merged, dword-aligned for the lower reg offset.

Because of those constraints I'm currently running this pass on pre-emit,
after RA and frame finalization. But at that point most of the choices made
(RA, frame offsets), and those are obviously suboptimal. The most common
issue can look somehow like this:
    str r1, [fp, -4]
    str r2, [fp, -8]
Those two stores can't be merged because the lower reg (r1) is not even. To
merge them, r1 should be changed to r0, and r2 to r1. Sometimes the same
problem happens when the frame offset is misaligned, e.g. r0 will have
offset aligned to word, not dword.

Can someone please point me out in which direction should I move? And also
- at which step should I apply such pass? If on PreRA - how to set reg
constraints such as regsequence, as well as frame constraints? If before
frame finalization - how to  set frame constraints? If on pre-emit like i'm
doing now - how to optimize and rewrite frame offsets and regs?

Thanks,
Petr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170613/11623a11/attachment-0001.html>