[llvm-dev] Optimization of successive constant stores

Fri Dec 11 09:13:06 PST 2015

----- Original Message -----
> From: "David Jones" <djones at xtreme-eda.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-dev at lists.llvm.org
> Sent: Friday, December 11, 2015 11:07:28 AM
> Subject: Re: [llvm-dev] Optimization of successive constant stores
> 
> Hmm... found an interesting issue:
> 
> Given:
> 
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 8
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 2
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> llc generates:
> 
> movb $1, (%rdi)
> movb $2, 1(%rdi)
> movb $3, 2(%rdi)
> movb $4, 3(%rdi)
> retq
> 
> But given:
> 
> %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> We get:
> 
> movl $67305985, (%rdi) # imm = 0x4030201
> retq
> 
> 
> Also interesting:
> 
> define void @test(i8*) {
> %2 = getelementptr inbounds i8* %0, i32 0
> store i8 1, i8* %2, align 1
> %3 = getelementptr inbounds i8* %0, i32 1
> store i8 2, i8* %3, align 1
> %4 = getelementptr inbounds i8* %0, i32 2
> store i8 3, i8* %4, align 1
> %5 = getelementptr inbounds i8* %0, i32 3
> store i8 4, i8* %5, align 1
> ret void
> 
> 
> This code also results in a single store. However, there is no
> guarantee that the input pointer is 32-bit aligned. x86_64 tolerates
> this, but the ABI mandates aligned loads and stores. My previous
> example guaranteed alignment, because I started with a pointer to a
> structure containing a member having 8-byte alignment, therefore we
> could infer the alignment of some of the stores.
> 
> And checking the code in DAGCombiner.cpp indeed shows that all
> combined instructions have to have the same alignment. Why?
> 
> 

Seems like a bug. Overalignment should be pretty safe to ignore. Can you either submit a patch or file a bug report?

Thanks,
Hal

> 
> 
> On Fri, Dec 11, 2015 at 11:37 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> 
> 
> Hi David,
> 
> We generally handle this (early) in the backend where we have more
> information about target capabilities and costs. See
> MergeConsecutiveStores in lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> 
> -Hal
> 
> ----- Original Message -----
> > From: "David Jones via llvm-dev" < llvm-dev at lists.llvm.org >
> > To: llvm-dev at lists.llvm.org
> > Sent: Friday, December 11, 2015 10:32:50 AM
> > Subject: [llvm-dev] Optimization of successive constant stores
> > 
> > Consider the following:
> > 
> > t arget datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> > target triple = "x86_64-unknown-linux-gnu"
> > 
> > %UodStructType = type { i8, i8, i8, i8, i32, i8* }
> > 
> > define void @test(%UodStructType*) {
> > %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
> > store i8 1, i8* %2, align 8
> > %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
> > store i8 2, i8* %3, align 1
> > %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
> > store i8 3, i8* %4, align 2
> > %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
> > store i8 4, i8* %5, align 1
> > ret void
> > }
> > 
> > If I run this through opt -O3, it passes through unchanged.
> > 
> > However, I would think that it would be profitable to combine the
> > stores into a single instruction, e.g.:
> > 
> > define void @test(%UodStructType*) {
> > %2 = bitcast %UodStructType* %0 to i32*
> > store i32 0x04030201, i32* %2, align 8
> > 
> > 
> > ret void
> > }
> > 
> > 
> > I don't see any optimization that would do this.
> > 
> > Interestingly, if I store the same 8-bit constant in all four
> > bytes,
> > then MemCpyOpt will indeed convert this to a 32-bit store.
> > 
> > 
> > Am I doing something wrong, or is there really no optimization pass
> > that can clean this up?
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory