[llvm-dev] Optimization of successive constant stores
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Fri Dec 11 09:13:06 PST 2015
----- Original Message -----
> From: "David Jones" <djones at xtreme-eda.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, December 11, 2015 11:07:28 AM
> Subject: Re: [llvm-dev] Optimization of successive constant stores
>
> Hmm... found an interesting issue:
>
> Given:
>
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 8
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 2
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
>
> llc generates:
>
> movb $1, (%rdi)
> movb $2, 1(%rdi)
> movb $3, 2(%rdi)
> movb $4, 3(%rdi)
> retq
>
> But given:
>
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
>
> We get:
>
> movl $67305985, (%rdi) # imm = 0x4030201
> retq
>
>
> Also interesting:
>
> define void @test(i8*) {
> %2 = getelementptr inbounds i8* %0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds i8* %0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds i8* %0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds i8* %0, i32 3
> store i8 4, i8* %5, align 1
> ret void
>
>
> This code also results in a single store. However, there is no
> guarantee that the input pointer is 32-bit aligned. x86_64 tolerates
> this, but the ABI mandates aligned loads and stores. My previous
> example guaranteed alignment, because I started with a pointer to a
> structure containing a member having 8-byte alignment, therefore we
> could infer the alignment of some of the stores.
>
> And checking the code in DAGCombiner.cpp indeed shows that all
> combined instructions have to have the same alignment. Why?
>
>
Seems like a bug. Overalignment should be pretty safe to ignore. Can you either submit a patch or file a bug report?
Thanks,
Hal
>
>
> On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
>
>
> Hi David,
>
> We generally handle this (early) in the backend where we have more
> information about target capabilities and costs. See
> MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>
> -Hal
>
> ----- Original Message -----
> > From: "David Jones via llvm-dev" < llvm-dev at lists.llvm.org >
> > To: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 10:32:50 AM
> > Subject: [llvm-dev] Optimization of successive constant stores
> >
> > Consider the following:
> >
> > t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > target triple = "x86_64-unknown-linux-gnu"
> >
> > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> >
> > define void @test(%UodStructType*) {
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> > }
> >
> > If I run this through opt -O3, it passes through unchanged.
> >
> > However, I would think that it would be profitable to combine the
> > stores into a single instruction, e.g.:
> >
> > define void @test(%UodStructType*) {
> > %2 = bitcast %UodStructType* %0 to i32*
> > store i32 0x04030201, i32* %2, align 8
> >
> >
> > ret void
> > }
> >
> >
> > I don't see any optimization that would do this.
> >
> > Interestingly, if I store the same 8-bit constant in all four
> > bytes,
> > then MemCpyOpt will indeed convert this to a 32-bit store.
> >
> >
> > Am I doing something wrong, or is there really no optimization pass
> > that can clean this up?
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list