<div dir="ltr">Well, that is now a slightly different question.<div><br></div><div>Once the compiler can do 64-bit loads/stores for a 64-bit integer type (e.g. C long long), then an optimization pass should be merging the loads/stores before register allocation, so that appropriate registers can be chosen.</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 28, 2017 at 5:43 AM, Peter Bel via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div><div><div>Hi,<br><br></div>I've looked through both AMDGPU and Sparc backends, and it seems they also do not perform the thing I want to make. The only backend which is doing it is AArch64, but it doesn't have reg constraints.<br></div>So, just with an example. I have the following C code:<br><span style="font-family:monospace,monospace"><br>void test()<br>{<br>  int a = 1; int b = 2; int c = 3; int d = 4;<br>  a++; b++; c++; d++;<br>}<br><br></span></div><span style="font-family:arial,helvetica,sans-serif">Without any frontend optimization is compiles to the following IR.</span><span style="font-family:monospace,monospace"><br><br>define void @test(i32* %z) #0 {<br>  %1 = alloca i32*, align 4<br>  %a = alloca i32, align 4<br>  %b = alloca i32, align 4<br>  %c = alloca i32, align 4<br>  %d = alloca i32, align 4<br>  store i32* %z, i32** %1, align 4<br>  store i32 1, i32* %a, align 4<br>  store i32 2, i32* %b, align 4<br>  store i32 3, i32* %c, align 4<br>  store i32 4, i32* %d, align 4<br>  %2 = load i32, i32* %a, align 4<br>  %3 = add nsw i32 %2, 1<br>  store i32 %3, i32* %a, align 4<br>  %4 = load i32, i32* %b, align 4<br>  %5 = add nsw i32 %4, 1<br>  store i32 %5, i32* %b, align 4<br>  .....<br>}<br><br></span></div><span style="font-family:monospace,monospace"><font face="arial,helvetica,sans-serif">Which produces the following asm code.<br><br></font>        mov     r2, #1<br>        str     r2, [fp, #-2]<br>        mov     r3, #2<br>        mov     r2, #3<br>        str     r3, [fp, #-3]<br>        str     r2, [fp, #-4]<br>        mov     r3, #4<br>        ldr     r2, [fp, #-2]<br>        str     r3, [fp, #-5]<br>        .....<br><font face="arial,helvetica,sans-serif"><br></font></span></div>What I want to do is to merge neighboring stores and loads. For example<br><span style="font-family:monospace,monospace">        mov     r3, #2<br>        mov     r2, #3<br>        str     r3, [fp, #-5]<br>        str     r2, [fp, #-4]<br></span></div><span style="font-family:arial,helvetica,sans-serif">Can be converted to</span><span style="font-family:monospace,monospace"><br></span><span style="font-family:monospace,monospace">        mov     r3, #2<br>        mov     r2, #3<br>        strd    r2, [fp, #-4]<br></span></div><span style="font-family:arial,helvetica,sans-serif">But the main problem is that the offset for r3 </span><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif">in the snippet above </span>was -3, not -5</span><span style="font-family:monospace,monospace">.<br></span><span style="font-family:arial,helvetica,sans-serif"></span></div></div><br><div>Currently, i'm doing the following. During the pre-RA i'm creating a REG_SEQUENCE with the target class, assigning vregs in question as its subregs, and create a load/store inst for the sequence with mem references merged.<br></div><div>It solves the register constraint problem, but the frame allocation problem still exists. Probably I'll need to use fixed stack objects and manually pre-allocate the frame, which i really don't want to do as it can break some other passes.<br><br></div><div>Petr<br></div><div><div><span style="font-family:monospace,monospace"><span style="font-family:monospace,monospace"><br></span></span></div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jun 17, 2017 at 10:31 AM, 陳韋任 <span dir="ltr"><<a href="mailto:chenwj.cs97g@g2.nctu.edu.tw" target="_blank">chenwj.cs97g@g2.nctu.edu.tw</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">That question makes no sense.<div>- Every virtual register has a register class assigned.</div><div>- You can construct special register classes that represent register tuples so that when the allocator chooses an entry from that register class it really has choosen a tuple of machine registers (even though it looks like a single register with funny aliasing as far as llvm codegen is concerned).</div></div></blockquote><div><br></div></span><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">And we still have to lower load i64 to load v2i32, right?</div></div><span><div><br></div>-- <br><div class="m_5322119900705240042m_-7427286002512788644gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Wei-Ren Chen (陳韋任)<br>Homepage: <a href="https://people.cs.nctu.edu.tw/~chenwj" target="_blank">https://people.cs.nctu.edu.tw/<wbr>~chenwj</a></div></div></div>

</span></div></div>

</blockquote></div><br></div>

</div></div><br>______________________________<wbr>_________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>

<br></blockquote></div><br></div>