[llvm-dev] Wide load/store optimization question

Wed Jun 28 06:19:59 PDT 2017

Well, that is now a slightly different question.

Once the compiler can do 64-bit loads/stores for a 64-bit integer type
(e.g. C long long), then an optimization pass should be merging the
loads/stores before register allocation, so that appropriate registers can
be chosen.

On Wed, Jun 28, 2017 at 5:43 AM, Peter Bel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi,
>
> I've looked through both AMDGPU and Sparc backends, and it seems they also
> do not perform the thing I want to make. The only backend which is doing it
> is AArch64, but it doesn't have reg constraints.
> So, just with an example. I have the following C code:
>
> void test()
> {
>   int a = 1; int b = 2; int c = 3; int d = 4;
>   a++; b++; c++; d++;
> }
>
> Without any frontend optimization is compiles to the following IR.
>
> define void @test(i32* %z) #0 {
>   %1 = alloca i32*, align 4
>   %a = alloca i32, align 4
>   %b = alloca i32, align 4
>   %c = alloca i32, align 4
>   %d = alloca i32, align 4
>   store i32* %z, i32** %1, align 4
>   store i32 1, i32* %a, align 4
>   store i32 2, i32* %b, align 4
>   store i32 3, i32* %c, align 4
>   store i32 4, i32* %d, align 4
>   %2 = load i32, i32* %a, align 4
>   %3 = add nsw i32 %2, 1
>   store i32 %3, i32* %a, align 4
>   %4 = load i32, i32* %b, align 4
>   %5 = add nsw i32 %4, 1
>   store i32 %5, i32* %b, align 4
>   .....
> }
>
> Which produces the following asm code.
>
>         mov     r2, #1
>         str     r2, [fp, #-2]
>         mov     r3, #2
>         mov     r2, #3
>         str     r3, [fp, #-3]
>         str     r2, [fp, #-4]
>         mov     r3, #4
>         ldr     r2, [fp, #-2]
>         str     r3, [fp, #-5]
>         .....
>
> What I want to do is to merge neighboring stores and loads. For example
>         mov     r3, #2
>         mov     r2, #3
>         str     r3, [fp, #-5]
>         str     r2, [fp, #-4]
> Can be converted to
>         mov     r3, #2
>         mov     r2, #3
>         strd    r2, [fp, #-4]
> But the main problem is that the offset for r3 in the snippet above was
> -3, not -5.
>
> Currently, i'm doing the following. During the pre-RA i'm creating a
> REG_SEQUENCE with the target class, assigning vregs in question as its
> subregs, and create a load/store inst for the sequence with mem references
> merged.
> It solves the register constraint problem, but the frame allocation
> problem still exists. Probably I'll need to use fixed stack objects and
> manually pre-allocate the frame, which i really don't want to do as it can
> break some other passes.
>
> Petr
>
>
> On Sat, Jun 17, 2017 at 10:31 AM, 陳韋任 <chenwj.cs97g at g2.nctu.edu.tw> wrote:
>
>> That question makes no sense.
>>> - Every virtual register has a register class assigned.
>>> - You can construct special register classes that represent register
>>> tuples so that when the allocator chooses an entry from that register class
>>> it really has choosen a tuple of machine registers (even though it looks
>>> like a single register with funny aliasing as far as llvm codegen is
>>> concerned).
>>>
>>
>> And we still have to lower load i64 to load v2i32, right?
>>
>> --
>> Wei-Ren Chen (陳韋任)
>> Homepage: https://people.cs.nctu.edu.tw/~chenwj
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170628/15fb462f/attachment.html>