[llvm-bugs] [Bug 38656] New: Unnecessary register spilling
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue Aug 21 01:24:36 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=38656
Bug ID: 38656
Summary: Unnecessary register spilling
Product: new-bugs
Version: 6.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: maarten.bosmans at vortech.nl
CC: llvm-bugs at lists.llvm.org
Clang compiles the loop in this function
void stencil1(int start, int stop, ptrdiff_t stride,
float *restrict a, float *restrict b,
float (*restrict c)[stride]) {
const float *restrict c1 = &c[1][0];
const float *restrict c2 = &c[2][0];
const float *restrict c3 = &c[3][0];
const float *restrict c4 = &c[4][0];
const float *restrict c5 = &c[5][0];
const float *restrict c6 = &c[6][0];
const float *restrict c7 = &c[7][0];
const float *restrict c8 = &c[8][0];
for (int i = start; i <= stop; i++) {
a[i] += b[1] * c1[i] + b[2] * c2[i]
+ b[3] * c3[i] + b[4] * c4[i]
+ b[5] * c5[i] + b[6] * c6[i]
+ b[7] * c7[i] + b[8] * c8[i];
}
}
as (using AVX2)
.LBB0_6: # =>This Inner Loop Header: Depth=1
lea rbx, [r11 + r13]
vmulps ymm0, ymm8, ymmword ptr [r10 + 4*r11]
mov r14, qword ptr [rsp - 88] # 8-byte Reload
vfmadd231ps ymm0, ymm9, ymmword ptr [r14 + 4*rbx] # ymm0 = (ymm9 * mem) +
ymm0
mov rax, qword ptr [rsp - 96] # 8-byte Reload
vfmadd231ps ymm0, ymm10, ymmword ptr [rax + 4*rbx] # ymm0 = (ymm10 * mem) +
ymm0
vfmadd231ps ymm0, ymm11, ymmword ptr [r8 + 4*rbx] # ymm0 = (ymm11 * mem) +
ymm0
mov rax, qword ptr [rsp - 104] # 8-byte Reload
vfmadd231ps ymm0, ymm12, ymmword ptr [rax + 4*rbx] # ymm0 = (ymm12 * mem) +
ymm0
vfmadd231ps ymm0, ymm13, ymmword ptr [r12 + 4*rbx] # ymm0 = (ymm13 * mem) +
ymm0
vfmadd231ps ymm0, ymm14, ymmword ptr [r15 + 4*rbx] # ymm0 = (ymm14 * mem) +
ymm0
vfmadd231ps ymm0, ymm15, ymmword ptr [rdi + 4*rbx] # ymm0 = (ymm15 * mem) +
ymm0
vaddps ymm0, ymm0, ymmword ptr [r9 + 4*r11]
vmovups ymmword ptr [r9 + 4*r11], ymm0
add r11, 8
cmp rbp, r11
jne .LBB0_6
The b values are broadcasted to ymm8-ymm15 before the loop, which is nice.
The same is not done for all the adresses of c1..c8. Some of them are stored in
registers, but others are loaded from the stack first in rax (and weirdly r14).
I think this harms performance. There should be enough registers free to hoist
the loading of the addresses outside the loop so the three mov instructions can
be removed from the loop.
If the c1..c8 variables are pointer arguments to a function instead of coming
out of a VLA calculation, the register spilling does not occur.
Godbolt link:
https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSAbAQwDtQB9T5UgZ1QFdiyTCADkAUgBMAZjwtkDflgDU46QGEeBdFgBmAOgRrs4gAwBBM%2BYBuqPOmVbM8vA1oQ5BRwSbECpZU9vVAAHAJCCYnxdXXYvLWJ7TFIrZTT0tN0GVCYvACpiTAS8ZC8mAKyc/MLi0uUAIxSLDIzK3OUIAqLIkq9kAEpxAFYAIWKsYYARftUAdhGrVPS0Fi1lNuruxLrkWlVpSdVJADZkYZHaKfPTK%2BkF5uXUVa8N5S7avsl9w6lT88krqMbkNJmp7uYMis1q93j0dtJvkc/qNpICRsDQXclmkoS9su1Yds%2BgAWRG/M6jYlojFg7HKXHrfGbD70oZkk4UkZDam3cGQp7QplvGpwvrHdnIkbHHkg2kPHECvFVYVbXr02YSzmzGWYvmPZ6M5WEtXIAAcmvOpp1css8vWJA6QTwiK0vgIYMC%2B3UakOWlCHrwUgWkhGM3E8zpGSY50DIKOCwODXOlzjeXpKZGsZ%2BIaTowBqfp%2BczVztLRaQdzI1RBeQ1eLcYr9XOVJrLfroNLZfSjfO3Jrffb8cr0prI/bka7aR7o21Ndng%2BnIytNeX46xdvDHcssy3VldBBKyls9m8zmQrkk9Q8LHiPj8ASCfrCE8yQuNdXKhoJIqJDSaEK7BkYR/E16HpRUv2ZUVC3/ScgLfED4QCeCjUQklYMAiDgNVHYhmQrCEJwsUMLLFDvyI9V8INbCWTNMMIwsOldAda8vGdH1vDdAMvQ4p8AyDIN6L1Mto1GLN4w4ptRhTQ4012GMpiHKSRnzWTCwU2VO0nRdqzU2sNOzEZKxbPS2yzF8u0XPs9IHcTFxHPSx3MrTLJzZTZz0%2Bc7Lcy1FLk1dzPXACpx3RZQosY8HCcFwGEkJhWM4%2B9AhvYIwmUCIojwGI4m8RIsBIloaOgz8it/RoLNeTo0PpQZRnGTApiEulovPWKr1dJKnwCT9ypc8lkx1AJ%2BrzQaLOGqtBqRTkTOuKYCoycbrNmkEho5c4HOWyZ5u7NaZ0m8bl1BIFGptTcrFEfpGDEIZRFIFgxFMW7UDEdRHAEIRMCOaRaFuggHouy6AGsQEkYl9GJIZjgATlNIZiWOU1TFNSQockK7RGJW77tER7SGe0Rbp4EBTFIP6cYu0g4FgJA0AAWxCVxMDICgIDphmGCZlBgGR0hdFcAgmaJiB6n%2B0h6jkXwAE8xB%2B0g6dp5wCAAeRYBhpfJ0gsFp1hgA50X8EKUo8GsIpRcwAAPTBkH4AWZduzxMAYUWelp/7LuYNgQE4dhuAYPB6iJyBLtCA8BTEABaLR0B9ZAoah5Rw4AdSYBgGATpXdCYNZw%2B1oQjAOBAs4Ad0dtPw90FhUHD/gWGIVBU/D7JQh4BOK6ruQ/ZYTBCfe4Q6Hd66sdF/HzdNY5w%2BOUlgGQZBlFNfQvggXBCAdKRvoCdRUHpxniC%2B2gZnUX63cuhBMCYLBiEoIGQekcHTWJWhpGOWgofh2ZTVmWZjnRzHSFd2hTAk2xrjfGhNiakzdpTGAiAUBb3ZkzcglA2Y7xQAoHWcRiA10BrzfmgtKAiw1uLFgUs7Zyy3grG8Ks1b60wNrNgesNYGytgeE2RMNYWytjbEQohZYOydhrF2kCPYcC4Iwf2gcIDBwiHgMOohI7aBjnHBOydU7p0ztnXOyB86TELjwEuqjy6V2rrXeuDBG6oGbq3IxHc5Dd14L3EQ%2B90Y3TusPMQo9x6T3pMIjokQsEzCXvgIgu815gU3tvDmISZDSAPkfcm/QT5nwvlfUgwMTj6CGLMYkccEamEfqDYkaN%2BG/3/oA1xGtQG8HAWTR6l0qYwOQZExBrM4EoPPDPWgpocEMAFsQIWBDcZEJITw268tFZUPVrjLWOsGFTLwIbFhpt2GW2trbEZ5AbyO2dokV28TGA6y9qIv2Ad4BSNDqsCOGcs5eBzr4LRPpdH6LLjXOuqcfS4z4IIPuTj%2BEuOAU9dxY8J6khKLPTp%2BhTAQo6MvYJe8N6tMiV9IYsSIHxMSefTmkjUkgyGHfaQppTS0CGNIIYUNH6mihj/W6pSgFuIJlUkmNSAbo0kEPCpYg4m1MuibPpMj7rEiAA%3D%3D
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180821/ea0a6548/attachment.html>
More information about the llvm-bugs
mailing list