<div dir="ltr"><div><div><div><div><div><div><div>Hmm... found an interesting issue:<br><br></div>Given:<br><br> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br> store i8 1, i8* %2, align 8<br> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br> store i8 2, i8* %3, align 1<br> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br> store i8 3, i8* %4, align 2<br> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br> store i8 4, i8* %5, align 1<br> ret void<br><br></div>llc generates:<br><br> movb $1, (%rdi)<br> movb $2, 1(%rdi)<br> movb $3, 2(%rdi)<br> movb $4, 3(%rdi)<br> retq<br><br></div>But given:<br><br> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br> store i8 1, i8* %2, align 1<br> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br> store i8 2, i8* %3, align 1<br> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br> store i8 3, i8* %4, align 1<br> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br> store i8 4, i8* %5, align 1<br> ret void<br><br></div>We get:<br><br> movl $67305985, (%rdi) # imm = 0x4030201<br> retq<br><br></div><div>Also interesting:<br><br>define void @test(i8*) {<br> %2 = getelementptr inbounds i8* %0, i32 0<br> store i8 1, i8* %2, align 1<br> %3 = getelementptr inbounds i8* %0, i32 1<br> store i8 2, i8* %3, align 1<br> %4 = getelementptr inbounds i8* %0, i32 2<br> store i8 3, i8* %4, align 1<br> %5 = getelementptr inbounds i8* %0, i32 3<br> store i8 4, i8* %5, align 1<br> ret void<br><br></div><div>This code also results in a single store. However, there is no guarantee that the input pointer is 32-bit aligned. x86_64 tolerates this, but the ABI mandates aligned loads and stores. My previous example guaranteed alignment, because I started with a pointer to a structure containing a member having 8-byte alignment, therefore we could infer the alignment of some of the stores.<br><br></div>And checking the code in DAGCombiner.cpp indeed shows that all combined instructions have to have the same alignment. Why?<br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi David,<br>
<br>
We generally handle this (early) in the backend where we have more information about target capabilities and costs. See MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp<br>
<br>
-Hal<br>
<br>
----- Original Message -----<br>
> From: "David Jones via llvm-dev" <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>><br>
> To: <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> Sent: Friday, December 11, 2015 10:32:50 AM<br>
> Subject: [llvm-dev] Optimization of successive constant stores<br>
><br>
> Consider the following:<br>
><br>
> t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>
<span class="">> target triple = "x86_64-unknown-linux-gnu"<br>
><br>
> %UodStructType = type { i8, i8, i8, i8, i32, i8* }<br>
><br>
> define void @test(%UodStructType*) {<br>
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0<br>
> store i8 1, i8* %2, align 8<br>
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1<br>
> store i8 2, i8* %3, align 1<br>
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2<br>
> store i8 3, i8* %4, align 2<br>
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3<br>
> store i8 4, i8* %5, align 1<br>
> ret void<br>
> }<br>
><br>
> If I run this through opt -O3, it passes through unchanged.<br>
><br>
> However, I would think that it would be profitable to combine the<br>
> stores into a single instruction, e.g.:<br>
><br>
> define void @test(%UodStructType*) {<br>
> %2 = bitcast %UodStructType* %0 to i32*<br>
> store i32 0x04030201, i32* %2, align 8<br>
><br>
><br>
> ret void<br>
> }<br>
><br>
><br>
> I don't see any optimization that would do this.<br>
><br>
> Interestingly, if I store the same 8-bit constant in all four bytes,<br>
> then MemCpyOpt will indeed convert this to a 32-bit store.<br>
><br>
><br>
> Am I doing something wrong, or is there really no optimization pass<br>
> that can clean this up?<br>
><br>
><br>
</span>> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
><br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</font></span></blockquote></div><br></div>