<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Auto-vectorization is too aggressive/naive about scatter/gather instructions"
href="https://bugs.llvm.org/show_bug.cgi?id=35170">35170</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Auto-vectorization is too aggressive/naive about scatter/gather instructions
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>dave@znu.io
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Note: LLVM r317088
clang+llvm ToT built itself and -march=knl is way too aggressive/naive about
using vector scatter/gather instructions during auto-vectorization. For
example, consider llvm::DenseMap<...>::grow(unsigned int) [with code gen
below).
This generates a large loop that moves a ton of scalars and pointers from the
scalar domain into the vector domain, packs said scalars, and then uses
vpscatterqq to store the scalars as scalars. If the vector scatter/gather
instructions were magically more efficient than the scalar equivalents, then
maybe all of the scalar-to-vector copying and packing would be worth it -- but
they're not. The scatter/gather instructions exist to *avoid* movement of data
between the scalar and vector domains (like what the compiler generated):
1068530: 48 8d 8a 10 fe ff ff lea -0x1f0(%rdx),%rcx
1068537: 48 8d 9a 20 fe ff ff lea -0x1e0(%rdx),%rbx
106853e: 48 8d aa 30 fe ff ff lea -0x1d0(%rdx),%rbp
1068545: 48 8d ba 40 fe ff ff lea -0x1c0(%rdx),%rdi
106854c: 4c 8d 82 50 fe ff ff lea -0x1b0(%rdx),%r8
1068553: 4c 8d b2 60 fe ff ff lea -0x1a0(%rdx),%r14
106855a: 4c 8d 8a 70 fe ff ff lea -0x190(%rdx),%r9
1068561: 4c 8d 92 80 fe ff ff lea -0x180(%rdx),%r10
1068568: c4 c1 f9 6e ca vmovq %r10,%xmm1
106856d: c4 c1 f9 6e d1 vmovq %r9,%xmm2
1068572: c5 e9 6c c9 vpunpcklqdq %xmm1,%xmm2,%xmm1
1068576: c4 c1 f9 6e d6 vmovq %r14,%xmm2
106857b: c4 c1 f9 6e d8 vmovq %r8,%xmm3
1068580: c5 e1 6c d2 vpunpcklqdq %xmm2,%xmm3,%xmm2
1068584: c4 e3 6d 38 c9 01 vinserti128 $0x1,%xmm1,%ymm2,%ymm1
106858a: c4 e1 f9 6e d7 vmovq %rdi,%xmm2
106858f: c4 e1 f9 6e dd vmovq %rbp,%xmm3
1068594: c5 e1 6c d2 vpunpcklqdq %xmm2,%xmm3,%xmm2
1068598: c4 e1 f9 6e db vmovq %rbx,%xmm3
106859d: c4 e1 f9 6e e1 vmovq %rcx,%xmm4
10685a2: c5 d9 6c db vpunpcklqdq %xmm3,%xmm4,%xmm3
10685a6: c4 e3 65 38 d2 01 vinserti128 $0x1,%xmm2,%ymm3,%ymm2
10685ac: 62 f3 ed 48 3a c9 01 vinserti64x4 $0x1,%ymm1,%zmm2,%zmm1
10685b3: 4c 8d 82 90 fe ff ff lea -0x170(%rdx),%r8
10685ba: 4c 8d 8a a0 fe ff ff lea -0x160(%rdx),%r9
10685c1: 4c 8d 92 b0 fe ff ff lea -0x150(%rdx),%r10
10685c8: 4c 8d b2 c0 fe ff ff lea -0x140(%rdx),%r14
10685cf: 48 8d 8a d0 fe ff ff lea -0x130(%rdx),%rcx
10685d6: 48 8d ba e0 fe ff ff lea -0x120(%rdx),%rdi
10685dd: 48 8d aa f0 fe ff ff lea -0x110(%rdx),%rbp
10685e4: 48 8d 9a 00 ff ff ff lea -0x100(%rdx),%rbx
10685eb: c4 e1 f9 6e d3 vmovq %rbx,%xmm2
10685f0: c4 e1 f9 6e dd vmovq %rbp,%xmm3
10685f5: c5 e1 6c d2 vpunpcklqdq %xmm2,%xmm3,%xmm2
10685f9: c4 e1 f9 6e df vmovq %rdi,%xmm3
10685fe: c4 e1 f9 6e e1 vmovq %rcx,%xmm4
1068603: c5 d9 6c db vpunpcklqdq %xmm3,%xmm4,%xmm3
1068607: c4 e3 65 38 d2 01 vinserti128 $0x1,%xmm2,%ymm3,%ymm2
106860d: c4 c1 f9 6e de vmovq %r14,%xmm3
1068612: c4 c1 f9 6e e2 vmovq %r10,%xmm4
1068617: c5 d9 6c db vpunpcklqdq %xmm3,%xmm4,%xmm3
106861b: c4 c1 f9 6e e1 vmovq %r9,%xmm4
1068620: c4 c1 f9 6e e8 vmovq %r8,%xmm5
1068625: c5 d1 6c e4 vpunpcklqdq %xmm4,%xmm5,%xmm4
1068629: c4 e3 5d 38 db 01 vinserti128 $0x1,%xmm3,%ymm4,%ymm3
106862f: 62 f3 e5 48 3a d2 01 vinserti64x4 $0x1,%ymm2,%zmm3,%zmm2
1068636: 4c 8d 82 10 ff ff ff lea -0xf0(%rdx),%r8
106863d: 4c 8d 8a 20 ff ff ff lea -0xe0(%rdx),%r9
1068644: 4c 8d 92 30 ff ff ff lea -0xd0(%rdx),%r10
106864b: 4c 8d b2 40 ff ff ff lea -0xc0(%rdx),%r14
1068652: 48 8d 8a 50 ff ff ff lea -0xb0(%rdx),%rcx
1068659: 48 8d ba 60 ff ff ff lea -0xa0(%rdx),%rdi
1068660: 48 8d aa 70 ff ff ff lea -0x90(%rdx),%rbp
1068667: 48 8d 5a 80 lea -0x80(%rdx),%rbx
106866b: c4 e1 f9 6e db vmovq %rbx,%xmm3
1068670: c4 e1 f9 6e e5 vmovq %rbp,%xmm4
1068675: c5 d9 6c db vpunpcklqdq %xmm3,%xmm4,%xmm3
1068679: c4 e1 f9 6e e7 vmovq %rdi,%xmm4
106867e: c4 e1 f9 6e e9 vmovq %rcx,%xmm5
1068683: c5 d1 6c e4 vpunpcklqdq %xmm4,%xmm5,%xmm4
1068687: c4 e3 5d 38 db 01 vinserti128 $0x1,%xmm3,%ymm4,%ymm3
106868d: c4 c1 f9 6e e6 vmovq %r14,%xmm4
1068692: c4 c1 f9 6e ea vmovq %r10,%xmm5
1068697: c5 d1 6c e4 vpunpcklqdq %xmm4,%xmm5,%xmm4
106869b: c4 c1 f9 6e e9 vmovq %r9,%xmm5
10686a0: c4 c1 f9 6e f0 vmovq %r8,%xmm6
10686a5: c5 c9 6c ed vpunpcklqdq %xmm5,%xmm6,%xmm5
10686a9: c4 e3 55 38 e4 01 vinserti128 $0x1,%xmm4,%ymm5,%ymm4
10686af: 62 f3 dd 48 3a db 01 vinserti64x4 $0x1,%ymm3,%zmm4,%zmm3
10686b6: 4c 8d 42 90 lea -0x70(%rdx),%r8
10686ba: 4c 8d 4a a0 lea -0x60(%rdx),%r9
10686be: 4c 8d 52 b0 lea -0x50(%rdx),%r10
10686c2: 48 8d 5a c0 lea -0x40(%rdx),%rbx
10686c6: 48 8d 4a d0 lea -0x30(%rdx),%rcx
10686ca: 48 8d 7a e0 lea -0x20(%rdx),%rdi
10686ce: 48 8d 6a f0 lea -0x10(%rdx),%rbp
10686d2: c4 e1 f9 6e e2 vmovq %rdx,%xmm4
10686d7: c4 e1 f9 6e ed vmovq %rbp,%xmm5
10686dc: c5 d1 6c e4 vpunpcklqdq %xmm4,%xmm5,%xmm4
10686e0: c4 e1 f9 6e ef vmovq %rdi,%xmm5
10686e5: c4 e1 f9 6e f1 vmovq %rcx,%xmm6
10686ea: c5 c9 6c ed vpunpcklqdq %xmm5,%xmm6,%xmm5
10686ee: c4 e3 55 38 e4 01 vinserti128 $0x1,%xmm4,%ymm5,%ymm4
10686f4: c4 e1 f9 6e eb vmovq %rbx,%xmm5
10686f9: c4 c1 f9 6e f2 vmovq %r10,%xmm6
10686fe: c5 c9 6c ed vpunpcklqdq %xmm5,%xmm6,%xmm5
1068702: c4 c1 f9 6e f1 vmovq %r9,%xmm6
1068707: c4 c1 f9 6e f8 vmovq %r8,%xmm7
106870c: c5 c1 6c f6 vpunpcklqdq %xmm6,%xmm7,%xmm6
1068710: c4 e3 4d 38 ed 01 vinserti128 $0x1,%xmm5,%ymm6,%ymm5
1068716: 62 f3 d5 48 3a e4 01 vinserti64x4 $0x1,%ymm4,%zmm5,%zmm4
106871d: c5 fc 46 c8 kxnorw %k0,%k0,%k1
1068721: 62 f2 fd 49 a1 04 0d vpscatterqq %zmm0,0x0(,%zmm1,1){%k1}
1068728: 00 00 00 00
106872c: c5 fc 46 c8 kxnorw %k0,%k0,%k1
1068730: 62 f2 fd 49 a1 04 15 vpscatterqq %zmm0,0x0(,%zmm2,1){%k1}
1068737: 00 00 00 00
106873b: c5 fc 46 c8 kxnorw %k0,%k0,%k1
106873f: 62 f2 fd 49 a1 04 1d vpscatterqq %zmm0,0x0(,%zmm3,1){%k1}
1068746: 00 00 00 00
106874a: c5 fc 46 c8 kxnorw %k0,%k0,%k1
106874e: 62 f2 fd 49 a1 04 25 vpscatterqq %zmm0,0x0(,%zmm4,1){%k1}
1068755: 00 00 00 00
1068759: 48 81 c2 00 02 00 00 add $0x200,%rdx
1068760: 49 83 c5 e0 add $0xffffffffffffffe0,%r13
1068764: 0f 85 c6 fd ff ff jne 1068530
<_ZN4llvm8DenseMapIPNS_8MCSymbolENS_14PointerIntPairIS2_Lj1EbNS_21PointerLikeTypeTraitsIS2_EENS_18PointerIntPairInfoIS2_Lj1ES5_EEEENS_12DenseMapInfoIS2_EENS_6detail12DenseMapPairIS2_S8_EEE4growEj+0x100></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>