<div dir="ltr"><div><div><div><div><div><div><div>Hmm... found an interesting issue:<br><br></div>Given:<br><br>    %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br>    store i8 1, i8* %2, align 8<br>    %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br>    store i8 2, i8* %3, align 1<br>    %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br>    store i8 3, i8* %4, align 2<br>    %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br>    store i8 4, i8* %5, align 1<br>    ret void<br><br></div>llc generates:<br><br>        movb    $1, (%rdi)<br>        movb    $2, 1(%rdi)<br>        movb    $3, 2(%rdi)<br>        movb    $4, 3(%rdi)<br>        retq<br><br></div>But given:<br><br>    %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br>    store i8 1, i8* %2, align 1<br>    %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br>    store i8 2, i8* %3, align 1<br>    %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br>    store i8 3, i8* %4, align 1<br>    %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br>    store i8 4, i8* %5, align 1<br>    ret void<br><br></div>We get:<br><br>        movl    $67305985, (%rdi)       # imm = 0x4030201<br>        retq<br><br></div><div>Also interesting:<br><br>define void @test(i8*) {<br>    %2 = getelementptr inbounds i8* %0, i32 0<br>    store i8 1, i8* %2, align 1<br>    %3 = getelementptr inbounds i8* %0, i32 1<br>    store i8 2, i8* %3, align 1<br>    %4 = getelementptr inbounds i8* %0, i32 2<br>    store i8 3, i8* %4, align 1<br>    %5 = getelementptr inbounds i8* %0, i32 3<br>    store i8 4, i8* %5, align 1<br>    ret void<br><br></div><div>This code also results in a single store. However, there is no guarantee that the input pointer is 32-bit aligned. x86_64 tolerates this, but the ABI mandates aligned loads and stores. My previous example guaranteed alignment, because I started with a pointer to a structure containing a member having 8-byte alignment, therefore we could infer the alignment of some of the stores.<br><br></div>And checking the code in DAGCombiner.cpp indeed shows that all combined instructions have to have the same alignment. Why?<br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi David,<br>

<br>

We generally handle this (early) in the backend where we have more information about target capabilities and costs. See MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>

<br>

 -Hal<br>

<br>

----- Original Message -----<br>

> From: "David Jones via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>

> To: <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> Sent: Friday, December 11, 2015 10:32:50 AM<br>

> Subject: [llvm-dev] Optimization of successive constant stores<br>

><br>

> Consider the following:<br>

><br>

> t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>

<span class="">> target triple = "x86_64-unknown-linux-gnu"<br>

><br>

> %UodStructType = type { i8, i8, i8, i8, i32, i8* }<br>

><br>

> define void @test(%UodStructType*) {<br>

> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br>

> store i8 1, i8* %2, align 8<br>

> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br>

> store i8 2, i8* %3, align 1<br>

> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br>

> store i8 3, i8* %4, align 2<br>

> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br>

> store i8 4, i8* %5, align 1<br>

> ret void<br>

> }<br>

><br>

> If I run this through opt -O3, it passes through unchanged.<br>

><br>

> However, I would think that it would be profitable to combine the<br>

> stores into a single instruction, e.g.:<br>

><br>

> define void @test(%UodStructType*) {<br>

> %2 = bitcast %UodStructType* %0 to i32*<br>

> store i32 0x04030201, i32* %2, align 8<br>

><br>

><br>

> ret void<br>

> }<br>

><br>

><br>

> I don't see any optimization that would do this.<br>

><br>

> Interestingly, if I store the same 8-bit constant in all four bytes,<br>

> then MemCpyOpt will indeed convert this to a 32-bit store.<br>

><br>

><br>

> Am I doing something wrong, or is there really no optimization pass<br>

> that can clean this up?<br>

><br>

><br>

</span>> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

><br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span></blockquote></div><br></div>