<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Redundant memory access in SIMD initialization of small struct"

   href="http://llvm.org/bugs/show_bug.cgi?id=20934">20934</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Redundant memory access in SIMD initialization of small struct

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>unspecified

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>LLVM Codegen

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>bisqwit@iki.fi

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Consider this code, on x86_64.

    extern void test(const void*);

    void func1(void)

    {

        struct { unsigned long Cache; unsigned a,b; } Bits = {0,0,0};

        test(&Bits);

    }

    void func2(void)

    {

        struct { unsigned long Cache; unsigned Count; } Bits = {0,0};

        test(&Bits);

    }

For func1, clang generates this code for the initialization of Bits:

        xorps   %xmm0, %xmm0

        movaps  %xmm0, (%rsp)

Which is good. However, for func2, clang generates this instead:

        movups  .Lfunc2.Bits(%rip), %xmm0

        movaps  %xmm0, (%rsp)

With this auxiliary data:

        .align  8

.Lfunc2.Bits:

        .quad   0                       # 0x0

        .long   0                       # 0x0

        .zero   4

        .size   .Lfunc2.Bits, 16

There is no difference between the internal presentation of the structures from

both functions. They are both 16 bytes long, fully zero-initialized.

It is obvious here that func2 could be implemented the exact same way as func1

was. It is a missed optimization opportunity.

Tested on:

  Debian clang version 3.5.0-+rc1-2 (tags/RELEASE_35/rc1) (based on LLVM 3.5.0)

  Target: x86_64-pc-linux-gnu

  Thread model: posix

With compiler options: -O1 -S  and -Ofast -S  (on -m64)

In LLVM code, the difference between these two functions is:

  %struct.anon = type { i64, i32, i32 }

    ...

  %Bits = alloca %struct.anon, align 8

  %1 = bitcast %struct.anon* %Bits to i8*

  call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 16, i32 8, i1 false)

v.s.

  %struct.anon.0 = type { i64, i32 }

  @func2.Bits = private unnamed_addr constant { i64, i32, [4 x i8] } { i64 0,

i32 0, [4 x i8] undef }, align 8

    ...

  %Bits = alloca %struct.anon.0, align 8

  %1 = bitcast %struct.anon.0* %Bits to i8*

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %1, i8* bitcast ({ i64, i32, [4 x

i8] }* @func2.Bits to i8*), i64 16, i32 8, i1 false)

For the record, GCC 4.9.1 doesn't seem to use SSE code for the struct

initialization in either case. Go clang!</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>