<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Multiple calls to llvm.prefetch.p0i8 on aarch64 can cause apparently-unnecessary register spills"
href="https://bugs.llvm.org/show_bug.cgi?id=51172">51172</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Multiple calls to llvm.prefetch.p0i8 on aarch64 can cause apparently-unnecessary register spills
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>srj@google.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Steps to repeat:
(1) See enclosed file prefetch.ll -- notice that there are several calls to
`@llvm.prefetch.p0i8`, with all but the first commented out
(2) Compile to aarch64 assembly with llc -march=aarch64 ~/prefetch.ll -o - -O3
-mattr=+dotprod > ~/prefetch.s
(3) Examine output, search for `prfm`, see block similar to
```
add x27, x26, x16
prfm pldl1strm, [x27]
ldr q28, [x27]
ldp q8, q29, [x12, #-32]
ldr q31, [x28, x16]
ldr q10, [x29, x16]
ldr q11, [x30, x16]
ldr q12, [x8, x16]
ldp q9, q30, [x18, #-32]
udot v26.4s, v8.16b, v28.4b[0]
udot v24.4s, v8.16b, v31.4b[0]
udot v22.4s, v8.16b, v10.4b[0]
udot v20.4s, v8.16b, v11.4b[0]
udot v18.4s, v8.16b, v12.4b[0]
```
(4) Edit prefetch.ll to uncomment the call to `llvm.prefetch.p0i8` on line 459,
re-run llc
(5) Search for `prfm` again, and note that the start of the block now contains
numerous vector-register spills that appear to be completely unnecessary:
```
add x27, x26, x16
prfm pldl1strm, [x27]
ldp q31, q30, [x18]
ldr q0, [x12]
ldr q1, [x27]
str q30, [sp, #496] // 16-byte Folded Spill
ldr q30, [x21]
str q0, [sp, #400] // 16-byte Folded Spill
ldr q0, [x12, #16]
add x27, x28, x16
stp q31, q30, [sp, #416] // 32-byte Folded Spill
ldr q30, [x21, #16]
ldp q11, q29, [x12, #-32]
str q0, [sp, #512] // 16-byte Folded Spill
ldp q10, q0, [x18, #-32]
str q30, [sp, #480] // 16-byte Folded Spill
ldp q30, q31, [x14, #-32]
ldp q8, q15, [x21, #-32]
udot v17.4s, v10.16b, v1.4b[0]
udot v17.4s, v0.16b, v1.4b[1]
str q31, [sp, #384] // 16-byte Folded Spill
ldr q31, [x14]
udot v16.4s, v30.16b, v1.4b[0]
udot v26.4s, v11.16b, v1.4b[0]
udot v26.4s, v29.16b, v1.4b[1]
str q31, [sp, #448] // 16-byte Folded Spill
ldr q31, [x14, #16]
udot v27.4s, v8.16b, v1.4b[0]
udot v27.4s, v15.16b, v1.4b[1]
subs x20, x20, #1 // =1
str q31, [sp, #464] // 16-byte Folded Spill
prfm pldl1strm, [x27]
ldr q9, [x27]
ldr q31, [x29, x16]
ldr q12, [x30, x16]
ldr q13, [x8, x16]
udot v2.4s, v10.16b, v9.4b[0]
udot v3.4s, v10.16b, v31.4b[0]
udot v14.4s, v10.16b, v12.4b[0]
udot v4.4s, v10.16b, v13.4b[0]
udot v2.4s, v0.16b, v9.4b[1]
udot v3.4s, v0.16b, v31.4b[1]
udot v14.4s, v0.16b, v12.4b[1]
udot v4.4s, v0.16b, v13.4b[1]
```
These spills don't seem to make any sense: the only instruction that should
have been added here was the second `prfm` instruction, and it doesn't depend
on any of the vector registers being spilled and reloaded. Is something about
`prefetch` affecting this (e.g., confusing the lifetime analysis for registers
loaded from the prefetch location)?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>