[llvm-dev] Optimization of successive constant stores

Fri Dec 11 08:32:50 PST 2015

Consider the following:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%UodStructType = type { i8, i8, i8, i8, i32, i8* }

define void @test(%UodStructType*) {
    %2 = getelementptr inbounds %UodStructType* %0, i32 0, i32 0
    store i8 1, i8* %2, align 8
    %3 = getelementptr inbounds %UodStructType* %0, i32 0, i32 1
    store i8 2, i8* %3, align 1
    %4 = getelementptr inbounds %UodStructType* %0, i32 0, i32 2
    store i8 3, i8* %4, align 2
    %5 = getelementptr inbounds %UodStructType* %0, i32 0, i32 3
    store i8 4, i8* %5, align 1
    ret void
}

If I run this through opt -O3, it passes through unchanged.

However, I would think that it would be profitable to combine the stores
into a single instruction, e.g.:

define void @test(%UodStructType*) {
    %2 = bitcast %UodStructType* %0 to i32*
    store i32 0x04030201, i32* %2, align 8
    ret void
}

I don't see any optimization that would do this.

Interestingly, if I store the same 8-bit constant in all four bytes, then
MemCpyOpt will indeed convert this to a 32-bit store.

Am I doing something wrong, or is there really no optimization pass that
can clean this up?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/17827054/attachment.html>