<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Multiple calls to llvm.prefetch.p0i8 on aarch64 can cause apparently-unnecessary register spills"

   href="https://bugs.llvm.org/show_bug.cgi?id=51172">51172</a>

          </td>

        </tr>


        <tr>

          <th>Summary</th>

          <td>Multiple calls to llvm.prefetch.p0i8 on aarch64 can cause apparently-unnecessary register spills

          </td>

        </tr>


        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>


        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>


        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>


        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>


        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>


        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>


        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>


        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>


        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>


        <tr>

          <th>Reporter</th>

          <td>srj@google.com

          </td>

        </tr>


        <tr>

          <th>CC</th>

          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Steps to repeat:


(1) See enclosed file prefetch.ll -- notice that there are several calls to

`@llvm.prefetch.p0i8`, with all but the first commented out


(2) Compile to aarch64 assembly with llc -march=aarch64 ~/prefetch.ll -o - -O3

-mattr=+dotprod > ~/prefetch.s


(3) Examine output, search for `prfm`, see block similar to


```

        add     x27, x26, x16

        prfm    pldl1strm, [x27]

        ldr     q28, [x27]

        ldp     q8, q29, [x12, #-32]

        ldr     q31, [x28, x16]

        ldr     q10, [x29, x16]

        ldr     q11, [x30, x16]

        ldr     q12, [x8, x16]

        ldp     q9, q30, [x18, #-32]

        udot    v26.4s, v8.16b, v28.4b[0]

        udot    v24.4s, v8.16b, v31.4b[0]

        udot    v22.4s, v8.16b, v10.4b[0]

        udot    v20.4s, v8.16b, v11.4b[0]

        udot    v18.4s, v8.16b, v12.4b[0]


```


(4) Edit prefetch.ll to uncomment the call to `llvm.prefetch.p0i8` on line 459,

re-run llc


(5) Search for `prfm` again, and note that the start of the block now contains

numerous vector-register spills that appear to be completely unnecessary:


```

        add     x27, x26, x16

        prfm    pldl1strm, [x27]

        ldp     q31, q30, [x18]

        ldr     q0, [x12]

        ldr     q1, [x27]

        str     q30, [sp, #496]                 // 16-byte Folded Spill

        ldr     q30, [x21]

        str     q0, [sp, #400]                  // 16-byte Folded Spill

        ldr     q0, [x12, #16]

        add     x27, x28, x16

        stp     q31, q30, [sp, #416]            // 32-byte Folded Spill

        ldr     q30, [x21, #16]

        ldp     q11, q29, [x12, #-32]

        str     q0, [sp, #512]                  // 16-byte Folded Spill

        ldp     q10, q0, [x18, #-32]

        str     q30, [sp, #480]                 // 16-byte Folded Spill

        ldp     q30, q31, [x14, #-32]

        ldp     q8, q15, [x21, #-32]

        udot    v17.4s, v10.16b, v1.4b[0]

        udot    v17.4s, v0.16b, v1.4b[1]

        str     q31, [sp, #384]                 // 16-byte Folded Spill

        ldr     q31, [x14]

        udot    v16.4s, v30.16b, v1.4b[0]

        udot    v26.4s, v11.16b, v1.4b[0]

        udot    v26.4s, v29.16b, v1.4b[1]

        str     q31, [sp, #448]                 // 16-byte Folded Spill

        ldr     q31, [x14, #16]

        udot    v27.4s, v8.16b, v1.4b[0]

        udot    v27.4s, v15.16b, v1.4b[1]

        subs    x20, x20, #1                    // =1

        str     q31, [sp, #464]                 // 16-byte Folded Spill

        prfm    pldl1strm, [x27]

        ldr     q9, [x27]

        ldr     q31, [x29, x16]

        ldr     q12, [x30, x16]

        ldr     q13, [x8, x16]

        udot    v2.4s, v10.16b, v9.4b[0]

        udot    v3.4s, v10.16b, v31.4b[0]

        udot    v14.4s, v10.16b, v12.4b[0]

        udot    v4.4s, v10.16b, v13.4b[0]

        udot    v2.4s, v0.16b, v9.4b[1]

        udot    v3.4s, v0.16b, v31.4b[1]

        udot    v14.4s, v0.16b, v12.4b[1]

        udot    v4.4s, v0.16b, v13.4b[1]

```


These spills don't seem to make any sense: the only instruction that should

have been added here was the second `prfm` instruction, and it doesn't depend

on any of the vector registers being spilled and reloaded. Is something about

`prefetch` affecting this (e.g., confusing the lifetime analysis for registers

loaded from the prefetch location)?</pre>

        </div>

      </p>


      <hr>

      <span>You are receiving this mail because:</span>


      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>