[llvm-bugs] [Bug 38656] New: Unnecessary register spilling

via llvm-bugs llvm-bugs at lists.llvm.org
Tue Aug 21 01:24:36 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=38656

            Bug ID: 38656
           Summary: Unnecessary register spilling
           Product: new-bugs
           Version: 6.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: maarten.bosmans at vortech.nl
                CC: llvm-bugs at lists.llvm.org

Clang compiles the loop in this function

void stencil1(int start, int stop, ptrdiff_t stride,
        float *restrict a, float *restrict b,
        float (*restrict c)[stride]) {

    const float *restrict c1 = &c[1][0];
    const float *restrict c2 = &c[2][0];
    const float *restrict c3 = &c[3][0];
    const float *restrict c4 = &c[4][0];
    const float *restrict c5 = &c[5][0];
    const float *restrict c6 = &c[6][0];
    const float *restrict c7 = &c[7][0];
    const float *restrict c8 = &c[8][0];

    for (int i = start; i <= stop; i++) {
        a[i] += b[1] * c1[i] + b[2] * c2[i]
              + b[3] * c3[i] + b[4] * c4[i]
              + b[5] * c5[i] + b[6] * c6[i]
              + b[7] * c7[i] + b[8] * c8[i];
    }
}

as (using AVX2)

.LBB0_6: # =>This Inner Loop Header: Depth=1
  lea rbx, [r11 + r13]
  vmulps ymm0, ymm8, ymmword ptr [r10 + 4*r11]
  mov r14, qword ptr [rsp - 88] # 8-byte Reload
  vfmadd231ps ymm0, ymm9, ymmword ptr [r14 + 4*rbx] # ymm0 = (ymm9 * mem) +
ymm0
  mov rax, qword ptr [rsp - 96] # 8-byte Reload
  vfmadd231ps ymm0, ymm10, ymmword ptr [rax + 4*rbx] # ymm0 = (ymm10 * mem) +
ymm0
  vfmadd231ps ymm0, ymm11, ymmword ptr [r8 + 4*rbx] # ymm0 = (ymm11 * mem) +
ymm0
  mov rax, qword ptr [rsp - 104] # 8-byte Reload
  vfmadd231ps ymm0, ymm12, ymmword ptr [rax + 4*rbx] # ymm0 = (ymm12 * mem) +
ymm0
  vfmadd231ps ymm0, ymm13, ymmword ptr [r12 + 4*rbx] # ymm0 = (ymm13 * mem) +
ymm0
  vfmadd231ps ymm0, ymm14, ymmword ptr [r15 + 4*rbx] # ymm0 = (ymm14 * mem) +
ymm0
  vfmadd231ps ymm0, ymm15, ymmword ptr [rdi + 4*rbx] # ymm0 = (ymm15 * mem) +
ymm0
  vaddps ymm0, ymm0, ymmword ptr [r9 + 4*r11]
  vmovups ymmword ptr [r9 + 4*r11], ymm0
  add r11, 8
  cmp rbp, r11
  jne .LBB0_6

The b values are broadcasted to ymm8-ymm15 before the loop, which is nice.
The same is not done for all the adresses of c1..c8. Some of them are stored in
registers, but others are loaded from the stack first in rax (and weirdly r14).

I think this harms performance. There should be enough registers free to hoist
the loading of the addresses outside the loop so the three mov instructions can
be removed from the loop.
If the c1..c8 variables are pointer arguments to a function instead of coming
out of a VLA calculation, the register spilling does not occur.
Godbolt link:
https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSAbAQwDtQB9T5UgZ1QFdiyTCADkAUgBMAZjwtkDflgDU46QGEeBdFgBmAOgRrs4gAwBBM%2BYBuqPOmVbM8vA1oQ5BRwSbECpZU9vVAAHAJCCYnxdXXYvLWJ7TFIrZTT0tN0GVCYvACpiTAS8ZC8mAKyc/MLi0uUAIxSLDIzK3OUIAqLIkq9kAEpxAFYAIWKsYYARftUAdhGrVPS0Fi1lNuruxLrkWlVpSdVJADZkYZHaKfPTK%2BkF5uXUVa8N5S7avsl9w6lT88krqMbkNJmp7uYMis1q93j0dtJvkc/qNpICRsDQXclmkoS9su1Yds%2BgAWRG/M6jYlojFg7HKXHrfGbD70oZkk4UkZDam3cGQp7QplvGpwvrHdnIkbHHkg2kPHECvFVYVbXr02YSzmzGWYvmPZ6M5WEtXIAAcmvOpp1css8vWJA6QTwiK0vgIYMC%2B3UakOWlCHrwUgWkhGM3E8zpGSY50DIKOCwODXOlzjeXpKZGsZ%2BIaTowBqfp%2BczVztLRaQdzI1RBeQ1eLcYr9XOVJrLfroNLZfSjfO3Jrffb8cr0prI/bka7aR7o21Ndng%2BnIytNeX46xdvDHcssy3VldBBKyls9m8zmQrkk9Q8LHiPj8ASCfrCE8yQuNdXKhoJIqJDSaEK7BkYR/E16HpRUv2ZUVC3/ScgLfED4QCeCjUQklYMAiDgNVHYhmQrCEJwsUMLLFDvyI9V8INbCWTNMMIwsOldAda8vGdH1vDdAMvQ4p8AyDIN6L1Mto1GLN4w4ptRhTQ4012GMpiHKSRnzWTCwU2VO0nRdqzU2sNOzEZKxbPS2yzF8u0XPs9IHcTFxHPSx3MrTLJzZTZz0%2Bc7Lcy1FLk1dzPXACpx3RZQosY8HCcFwGEkJhWM4%2B9AhvYIwmUCIojwGI4m8RIsBIloaOgz8it/RoLNeTo0PpQZRnGTApiEulovPWKr1dJKnwCT9ypc8lkx1AJ%2BrzQaLOGqtBqRTkTOuKYCoycbrNmkEho5c4HOWyZ5u7NaZ0m8bl1BIFGptTcrFEfpGDEIZRFIFgxFMW7UDEdRHAEIRMCOaRaFuggHouy6AGsQEkYl9GJIZjgATlNIZiWOU1TFNSQockK7RGJW77tER7SGe0Rbp4EBTFIP6cYu0g4FgJA0AAWxCVxMDICgIDphmGCZlBgGR0hdFcAgmaJiB6n%2B0h6jkXwAE8xB%2B0g6dp5wCAAeRYBhpfJ0gsFp1hgA50X8EKUo8GsIpRcwAAPTBkH4AWZduzxMAYUWelp/7LuYNgQE4dhuAYPB6iJyBLtCA8BTEABaLR0B9ZAoah5Rw4AdSYBgGATpXdCYNZw%2B1oQjAOBAs4Ad0dtPw90FhUHD/gWGIVBU/D7JQh4BOK6ruQ/ZYTBCfe4Q6Hd66sdF/HzdNY5w%2BOUlgGQZBlFNfQvggXBCAdKRvoCdRUHpxniC%2B2gZnUX63cuhBMCYLBiEoIGQekcHTWJWhpGOWgofh2ZTVmWZjnRzHSFd2hTAk2xrjfGhNiakzdpTGAiAUBb3ZkzcglA2Y7xQAoHWcRiA10BrzfmgtKAiw1uLFgUs7Zyy3grG8Ks1b60wNrNgesNYGytgeE2RMNYWytjbEQohZYOydhrF2kCPYcC4Iwf2gcIDBwiHgMOohI7aBjnHBOydU7p0ztnXOyB86TELjwEuqjy6V2rrXeuDBG6oGbq3IxHc5Dd14L3EQ%2B90Y3TusPMQo9x6T3pMIjokQsEzCXvgIgu815gU3tvDmISZDSAPkfcm/QT5nwvlfUgwMTj6CGLMYkccEamEfqDYkaN%2BG/3/oA1xGtQG8HAWTR6l0qYwOQZExBrM4EoPPDPWgpocEMAFsQIWBDcZEJITw268tFZUPVrjLWOsGFTLwIbFhpt2GW2trbEZ5AbyO2dokV28TGA6y9qIv2Ad4BSNDqsCOGcs5eBzr4LRPpdH6LLjXOuqcfS4z4IIPuTj%2BEuOAU9dxY8J6khKLPTp%2BhTAQo6MvYJe8N6tMiV9IYsSIHxMSefTmkjUkgyGHfaQppTS0CGNIIYUNH6mihj/W6pSgFuIJlUkmNSAbo0kEPCpYg4m1MuibPpMj7rEiAA%3D%3D

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180821/ea0a6548/attachment.html>


More information about the llvm-bugs mailing list