[llvm-bugs] [Bug 43691] New: InlineSpiller fails to rematerialize zero-idioms/allones-idioms on X86.

Wed Oct 16 07:10:53 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=43691

            Bug ID: 43691
           Summary: InlineSpiller fails to rematerialize
                    zero-idioms/allones-idioms on X86.
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Register Allocator
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: llvm-bugs at lists.llvm.org, quentin.colombet at gmail.com

Example:

```
_m128i all_zeroes(__m128i *Y, __m128 *Z, unsigned B) {
  __m128i X = (__m128i)(__v4si) { 0, 0, 0, 0 };
  __m128i Phi = *Y;
  while (B) {
    __asm volatile (""::: "xmm0", "xmm1",
                          "xmm2", "xmm3",
                          "xmm4", "xmm5",
                          "xmm6", "xmm7",
                          "xmm8", "xmm9",
                          "xmm10", "xmm11",
                          "xmm12", "xmm13",
                          "xmm14", "memory");
    Phi = (__m128i) *Z | _mm_cmpeq_epi32(Phi, X);
    --B;
  }

  return Phi;
}
```

Compile with -mavx on a generic linux x86_64.

This is the loop body (`-mavx` -- on a generic x86_64 linux triple):

```
.LBB0_1:
        vpcmpeqd        .LCPI0_0, %xmm15, %xmm0
        vpor    (%rsi), %xmm0, %xmm15
        addl    $-1, %edx
        jne     .LBB0_1
```

.LCPI0_0 is a constant pool entry for a zero vector:
```
  .zero   16
```

You will see that the compiler emits a needless constant pool entry for a zero
vector. That entry is then loaded in the loop body.

It is not a good idea to generate constant pool entries for zero vectors
because they are cheap to rematerialize. All modern x86 processors are able to
optimize out zero/all-ones idioms at register renaming stage. All those idioms
are also treated as dependency-breaking instructions, which means that they
don't have any input register dependency).

In this particular example, the InlineSpiller fails to rematerialize an
AVX_SET0 pseudo. So we end up with a constant pool entry for a zero vector, and
a folded-load in the vector compare instruction.

A full test case for both zero-idioms and allones-idioms can be found here at
this link: https://godbolt.org/z/h9UrVR

You can see how GCC generates a zero-idiom VPXOR to rematerialize the zero
vector within the loop. No load from constant pool is required. Interestingly
GCC is not smart enough to realize that all-ones idioms can be treated
specially.

```
.L3:
        vpxor   %xmm1, %xmm1, %xmm1
        vpcmpeqd        %xmm1, %xmm15, %xmm15
        vpor    (%rsi), %xmm15, %xmm15
        subl    $1, %edx
        jne     .L3
```

(Some of the) Affected PSEUDO opcodes are:

  V_SET0
  V_SETALLONES

  AVX_SET0
  AVX1_SETALLONES
  AVX2_SETALLONES

  AVX512_128_SET0
  AVX512_256_SET0
  AVX512_512_SET0

Here are the definitions from X86GenInstrInfo.inc:

```
  { 246,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo52, -1 ,nullptr },  // Inst #246 = V_SET0
  { 247,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo52, -1 ,nullptr },  // Inst #247 =
V_SETALLONES
  { 196,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr },  // Inst #196 =
AVX_SET0
  { 185,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr },  // Inst #185 =
AVX1_SETALLONES
  { 186,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr },  // Inst #186 =
AVX2_SETALLONES
  { 187,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo45, -1 ,nullptr },  // Inst #187 =
AVX512_128_SET0
  { 188,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo46, -1 ,nullptr },  // Inst #188 =
AVX512_256_SET0
  { 189,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo47, -1 ,nullptr },  // Inst #189 =
AVX512_512_SET0
  { 190,        1,      1,      0,      2,     
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo47, -1 ,nullptr },  // Inst #190 =
AVX512_512_SETALLONES

  static const MCOperandInfo OperandInfo44[] = { { X86::VR256RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
  static const MCOperandInfo OperandInfo45[] = { { X86::VR128XRegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
  static const MCOperandInfo OperandInfo46[] = { { X86::VR256XRegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
  static const MCOperandInfo OperandInfo47[] = { { X86::VR512RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
  static const MCOperandInfo OperandInfo52[] = { { X86::VR128RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
```

All these opcodes have similar properties. In particular, they are all
`Rematerializable` and `CheapAsMove`.
By the time regalloc is run, these pseudos have not been replaced/expanded yet.
So, Regalloc still sees them as Pseudo.

The problem is in method `InlineSpiller::reMaterializeFor` (see below):

```
/// reMaterializeFor - Attempt to rematerialize before MI instead of reloading.
bool InlineSpiller::reMaterializeFor(LiveInterval &VirtReg, MachineInstr &MI) {
  // Analyze instruction
  SmallVector<std::pair<MachineInstr *, unsigned>, 8> Ops;
  MIBundleOperands::VirtRegInfo RI =
      MIBundleOperands(MI).analyzeVirtReg(VirtReg.reg, &Ops);

  if (!RI.Reads)
    return false;
```

RI.Reads is always false for all those pseudos. That makes sense because
zero-idioms and all-ones idioms don't have register dependencies (i.e. it is
normal not to have register reads).

The check on the presence of register reads is surprisingly restrictive.
It is also unclear to me why it is needed at all. If somebody knows the reason
why we need that check then I'd be very happy to know. We definitely need a
code comment for it! (it would make life easier to people that want to study
that method).

According to `git blame`, that check was added by the following commit:

```
commit 296acbfe4f91fd80ffc64c54be69abec388c7017
Author: Patrik Hagglund <patrik.h.hagglund at ericsson.com>
Date:   Mon Sep 1 11:04:07 2014 +0000

    Fix in InlineSpiller to make the rematerilization loop also consider
    implicit uses of the whole register when a sub register is defined.

    Now the same iterator is used in the rematerilization loop as in the
    spill loop later.

    Patch provided by Mikael Holmen.

    This fix was proposed and reviewed by Quentin Colombet,
    http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/076135.html.

    Unfortunately, this error in the rematerilization code has only been
    seen in a large test case for an out-of-tree target, and is probably
    hard to reproduce on an in-tree target. Therefore, no testcase is
    provided.

    llvm-svn: 216873
```

Note: the link to the llvmdev thread mentioned by that message is probably
wrong. I am assuming that this is the correct reference:
http://lists.llvm.org/pipermail/llvm-dev/2014-August/075794.html

It is easy to workaround this issue. For example, we can treat Pseudo
instructions specially: if a pseudo is marked as rematerializable, then we can
probably ignore the register reads check.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191016/16e7bf9a/attachment-0001.html>