[llvm-dev] Optimization of successive constant stores

Fri Dec 11 11:07:15 PST 2015

That was a bug, but I fixed it already a while ago. Looks like you're using
an old version of llvm.

On Fri, Dec 11, 2015 at 12:13 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> ----- Original Message -----
> > From: "David Jones" <djones at xtreme-eda.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 11:07:28 AM
> > Subject: Re: [llvm-dev] Optimization of successive constant stores
> >
> > Hmm... found an interesting issue:
> >
> > Given:
> >
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> > llc generates:
> >
> > movb $1, (%rdi)
> > movb $2, 1(%rdi)
> > movb $3, 2(%rdi)
> > movb $4, 3(%rdi)
> > retq
> >
> > But given:
> >
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 1
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 1
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> > We get:
> >
> > movl $67305985, (%rdi) # imm = 0x4030201
> > retq
> >
> >
> > Also interesting:
> >
> > define void @test(i8*) {
> > %2 = getelementptr inbounds i8* %0, i32 0
> > store i8 1, i8* %2, align 1
> > %3 = getelementptr inbounds i8* %0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds i8* %0, i32 2
> > store i8 3, i8* %4, align 1
> > %5 = getelementptr inbounds i8* %0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> >
> >
> > This code also results in a single store. However, there is no
> > guarantee that the input pointer is 32-bit aligned. x86_64 tolerates
> > this, but the ABI mandates aligned loads and stores. My previous
> > example guaranteed alignment, because I started with a pointer to a
> > structure containing a member having 8-byte alignment, therefore we
> > could infer the alignment of some of the stores.
> >
> > And checking the code in DAGCombiner.cpp indeed shows that all
> > combined instructions have to have the same alignment. Why?
> >
> >
>
> Seems like a bug. Overalignment should be pretty safe to ignore. Can you
> either submit a patch or file a bug report?
>
> Thanks,
> Hal
>
> >
> >
> > On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> > Hi David,
> >
> > We generally handle this (early) in the backend where we have more
> > information about target capabilities and costs. See
> > MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >
> > -Hal
> >
> > ----- Original Message -----
> > > From: "David Jones via llvm-dev" < llvm-dev at lists.llvm.org >
> > > To: llvm-dev at lists.llvm.org
> > > Sent: Friday, December 11, 2015 10:32:50 AM
> > > Subject: [llvm-dev] Optimization of successive constant stores
> > >
> > > Consider the following:
> > >
> > > t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > > target triple = "x86_64-unknown-linux-gnu"
> > >
> > > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> > >
> > > define void @test(%UodStructType*) {
> > > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > > store i8 1, i8* %2, align 8
> > > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > > store i8 2, i8* %3, align 1
> > > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > > store i8 3, i8* %4, align 2
> > > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > > store i8 4, i8* %5, align 1
> > > ret void
> > > }
> > >
> > > If I run this through opt -O3, it passes through unchanged.
> > >
> > > However, I would think that it would be profitable to combine the
> > > stores into a single instruction, e.g.:
> > >
> > > define void @test(%UodStructType*) {
> > > %2 = bitcast %UodStructType* %0 to i32*
> > > store i32 0x04030201, i32* %2, align 8
> > >
> > >
> > > ret void
> > > }
> > >
> > >
> > > I don't see any optimization that would do this.
> > >
> > > Interestingly, if I store the same 8-bit constant in all four
> > > bytes,
> > > then MemCpyOpt will indeed convert this to a 32-bit store.
> > >
> > >
> > > Am I doing something wrong, or is there really no optimization pass
> > > that can clean this up?
> > >
> > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/da175445/attachment-0001.html>