<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [X86] X86AvoidStoreForwardingBlocks creates incomplete copies"

   href="https://bugs.llvm.org/show_bug.cgi?id=39926">39926</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[X86] X86AvoidStoreForwardingBlocks creates incomplete copies

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>nikita.ppv@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Originally reported and diagnosed in

<a href="https://github.com/rust-lang/rust/issues/56618">https://github.com/rust-lang/rust/issues/56618</a>.

The X86AvoidStoreForwardingBlocks pass seems to drop copies for some byte

ranges under some circumstances. The following IR...

define i8 @test_offset(i8* %base) #0 {

entry:

  %z = alloca [128 x i8], align 16

  %gep0 = getelementptr inbounds i8, i8* %base, i64 7

  store volatile i8 0, i8* %gep0

  %gep1 = getelementptr inbounds i8, i8* %base, i64 5

  %bc1 = bitcast i8* %gep1 to i16*

  store volatile i16 0, i16* %bc1

  %gep2 = getelementptr inbounds i8, i8* %base, i64 1

  %bc2 = bitcast i8* %gep2 to i32*

  store volatile i32 0, i32* %bc2

  %y1 = getelementptr inbounds i8, i8* %base, i64 -4

  %y2 = bitcast [128 x i8]* %z to i8*

  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %y2, i8* %y1, i64 16, i1 false)

  %gep4 = getelementptr inbounds [128 x i8], [128 x i8]* %z, i64 0, i64 4

  %ret = load i8, i8* %gep4

  ret i8 %ret

}

; Function Attrs: argmemonly nounwind

declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture

readonly, i64, i1) #1

attributes #0 = { "target-cpu"="core-avx2" }

...when run through llc yields...

        pushq   %rax

        .cfi_def_cfa_offset 16

        movb    $0, 7(%rdi)

        movw    $0, 5(%rdi)

        movl    $0, 1(%rdi)

        movzwl  -4(%rdi), %eax

        movw    %ax, -128(%rsp)

        movb    -2(%rdi), %al    # Copies -2..-1

        movb    %al, -126(%rsp)

        movl    1(%rdi), %eax    # Copies 1..5

        movl    %eax, -123(%rsp)

        movzwl  5(%rdi), %eax

        movw    %ax, -119(%rsp)

        movb    7(%rdi), %al

        movb    %al, -117(%rsp)

        movl    8(%rdi), %eax

        movl    %eax, -116(%rsp)

        movb    -124(%rsp), %al

        popq    %rcx

        .cfi_def_cfa_offset 8

        retq

Notably, a copy of the range -1..1 is missing.

MIR for the transform:

<a href="https://gist.github.com/nikic/61b5ac3390755b17dd542cd8131b926a">https://gist.github.com/nikic/61b5ac3390755b17dd542cd8131b926a</a> One maybe

relevant aspect that's visible in the MIR is that the displacement for the MOVs

and the MMO offsets are out of sync:

  %4:gr8 = MOV8rm %0:gr64, 1, $noreg, -2, $noreg :: (load 1 from %ir.y1 + 2)

  MOV8mr %stack.0.z, 1, $noreg, 2, $noreg, killed %4:gr8 :: (store 1 into

%ir.y2 + 2, align 16)

  %5:gr32 = MOV32rm %0:gr64, 1, $noreg, 1, $noreg :: (load 4 from %ir.y1 + 3,

align 1)

  MOV32mr %stack.0.z, 1, $noreg, 5, $noreg, killed %5:gr32 :: (store 4 into

%ir.y2 + 3, align 16)

The MOV displacement goes from -2 to 1, while the MMO offset goes from 2 to 3.

Both should be incrementing by the same value.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>