[LLVMbugs] [Bug 20934] New: Redundant memory access in SIMD initialization of small struct

Sat Sep 13 04:27:34 PDT 2014

http://llvm.org/bugs/show_bug.cgi?id=20934

            Bug ID: 20934
           Summary: Redundant memory access in SIMD initialization of
                    small struct
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: LLVM Codegen
          Assignee: unassignedclangbugs at nondot.org
          Reporter: bisqwit at iki.fi
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Consider this code, on x86_64.

    extern void test(const void*);

    void func1(void)
    {
        struct { unsigned long Cache; unsigned a,b; } Bits = {0,0,0};
        test(&Bits);
    }

    void func2(void)
    {
        struct { unsigned long Cache; unsigned Count; } Bits = {0,0};
        test(&Bits);
    }

For func1, clang generates this code for the initialization of Bits:

        xorps   %xmm0, %xmm0
        movaps  %xmm0, (%rsp)

Which is good. However, for func2, clang generates this instead:

        movups  .Lfunc2.Bits(%rip), %xmm0
        movaps  %xmm0, (%rsp)

With this auxiliary data:

        .align  8
.Lfunc2.Bits:
        .quad   0                       # 0x0
        .long   0                       # 0x0
        .zero   4
        .size   .Lfunc2.Bits, 16

There is no difference between the internal presentation of the structures from
both functions. They are both 16 bytes long, fully zero-initialized.

It is obvious here that func2 could be implemented the exact same way as func1
was. It is a missed optimization opportunity.

Tested on:

  Debian clang version 3.5.0-+rc1-2 (tags/RELEASE_35/rc1) (based on LLVM 3.5.0)
  Target: x86_64-pc-linux-gnu
  Thread model: posix

With compiler options: -O1 -S  and -Ofast -S  (on -m64)

In LLVM code, the difference between these two functions is:

  %struct.anon = type { i64, i32, i32 }
    ...
  %Bits = alloca %struct.anon, align 8
  %1 = bitcast %struct.anon* %Bits to i8*
  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 16, i32 8, i1 false)

v.s.

  %struct.anon.0 = type { i64, i32 }
  @func2.Bits = private unnamed_addr constant { i64, i32, [4 x i8] } { i64 0,
i32 0, [4 x i8] undef }, align 8
    ...
  %Bits = alloca %struct.anon.0, align 8
  %1 = bitcast %struct.anon.0* %Bits to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* bitcast ({ i64, i32, [4 x
i8] }* @func2.Bits to i8*), i64 16, i32 8, i1 false)

For the record, GCC 4.9.1 doesn't seem to use SSE code for the struct
initialization in either case. Go clang!

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140913/b60aa715/attachment.html>