[llvm-bugs] [Bug 43691] New: InlineSpiller fails to rematerialize zero-idioms/allones-idioms on X86.
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Oct 16 07:10:53 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=43691
Bug ID: 43691
Summary: InlineSpiller fails to rematerialize
zero-idioms/allones-idioms on X86.
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Register Allocator
Assignee: unassignedbugs at nondot.org
Reporter: andrea.dibiagio at gmail.com
CC: llvm-bugs at lists.llvm.org, quentin.colombet at gmail.com
Example:
```
_m128i all_zeroes(__m128i *Y, __m128 *Z, unsigned B) {
__m128i X = (__m128i)(__v4si) { 0, 0, 0, 0 };
__m128i Phi = *Y;
while (B) {
__asm volatile (""::: "xmm0", "xmm1",
"xmm2", "xmm3",
"xmm4", "xmm5",
"xmm6", "xmm7",
"xmm8", "xmm9",
"xmm10", "xmm11",
"xmm12", "xmm13",
"xmm14", "memory");
Phi = (__m128i) *Z | _mm_cmpeq_epi32(Phi, X);
--B;
}
return Phi;
}
```
Compile with -mavx on a generic linux x86_64.
This is the loop body (`-mavx` -- on a generic x86_64 linux triple):
```
.LBB0_1:
vpcmpeqd .LCPI0_0, %xmm15, %xmm0
vpor (%rsi), %xmm0, %xmm15
addl $-1, %edx
jne .LBB0_1
```
.LCPI0_0 is a constant pool entry for a zero vector:
```
.zero 16
```
You will see that the compiler emits a needless constant pool entry for a zero
vector. That entry is then loaded in the loop body.
It is not a good idea to generate constant pool entries for zero vectors
because they are cheap to rematerialize. All modern x86 processors are able to
optimize out zero/all-ones idioms at register renaming stage. All those idioms
are also treated as dependency-breaking instructions, which means that they
don't have any input register dependency).
In this particular example, the InlineSpiller fails to rematerialize an
AVX_SET0 pseudo. So we end up with a constant pool entry for a zero vector, and
a folded-load in the vector compare instruction.
A full test case for both zero-idioms and allones-idioms can be found here at
this link: https://godbolt.org/z/h9UrVR
You can see how GCC generates a zero-idiom VPXOR to rematerialize the zero
vector within the loop. No load from constant pool is required. Interestingly
GCC is not smart enough to realize that all-ones idioms can be treated
specially.
```
.L3:
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqd %xmm1, %xmm15, %xmm15
vpor (%rsi), %xmm15, %xmm15
subl $1, %edx
jne .L3
```
(Some of the) Affected PSEUDO opcodes are:
V_SET0
V_SETALLONES
AVX_SET0
AVX1_SETALLONES
AVX2_SETALLONES
AVX512_128_SET0
AVX512_256_SET0
AVX512_512_SET0
Here are the definitions from X86GenInstrInfo.inc:
```
{ 246, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo52, -1 ,nullptr }, // Inst #246 = V_SET0
{ 247, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo52, -1 ,nullptr }, // Inst #247 =
V_SETALLONES
{ 196, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr }, // Inst #196 =
AVX_SET0
{ 185, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr }, // Inst #185 =
AVX1_SETALLONES
{ 186, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo44, -1 ,nullptr }, // Inst #186 =
AVX2_SETALLONES
{ 187, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo45, -1 ,nullptr }, // Inst #187 =
AVX512_128_SET0
{ 188, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo46, -1 ,nullptr }, // Inst #188 =
AVX512_256_SET0
{ 189, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo47, -1 ,nullptr }, // Inst #189 =
AVX512_512_SET0
{ 190, 1, 1, 0, 2,
0|(1ULL<<MCID::Pseudo)|(1ULL<<MCID::FoldableAsLoad)|(1ULL<<MCID::Rematerializable)|(1ULL<<MCID::CheapAsAMove),
0x0ULL, nullptr, nullptr, OperandInfo47, -1 ,nullptr }, // Inst #190 =
AVX512_512_SETALLONES
static const MCOperandInfo OperandInfo44[] = { { X86::VR256RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
static const MCOperandInfo OperandInfo45[] = { { X86::VR128XRegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
static const MCOperandInfo OperandInfo46[] = { { X86::VR256XRegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
static const MCOperandInfo OperandInfo47[] = { { X86::VR512RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
static const MCOperandInfo OperandInfo52[] = { { X86::VR128RegClassID, 0,
MCOI::OPERAND_REGISTER, 0 }, };
```
All these opcodes have similar properties. In particular, they are all
`Rematerializable` and `CheapAsMove`.
By the time regalloc is run, these pseudos have not been replaced/expanded yet.
So, Regalloc still sees them as Pseudo.
The problem is in method `InlineSpiller::reMaterializeFor` (see below):
```
/// reMaterializeFor - Attempt to rematerialize before MI instead of reloading.
bool InlineSpiller::reMaterializeFor(LiveInterval &VirtReg, MachineInstr &MI) {
// Analyze instruction
SmallVector<std::pair<MachineInstr *, unsigned>, 8> Ops;
MIBundleOperands::VirtRegInfo RI =
MIBundleOperands(MI).analyzeVirtReg(VirtReg.reg, &Ops);
if (!RI.Reads)
return false;
```
RI.Reads is always false for all those pseudos. That makes sense because
zero-idioms and all-ones idioms don't have register dependencies (i.e. it is
normal not to have register reads).
The check on the presence of register reads is surprisingly restrictive.
It is also unclear to me why it is needed at all. If somebody knows the reason
why we need that check then I'd be very happy to know. We definitely need a
code comment for it! (it would make life easier to people that want to study
that method).
According to `git blame`, that check was added by the following commit:
```
commit 296acbfe4f91fd80ffc64c54be69abec388c7017
Author: Patrik Hagglund <patrik.h.hagglund at ericsson.com>
Date: Mon Sep 1 11:04:07 2014 +0000
Fix in InlineSpiller to make the rematerilization loop also consider
implicit uses of the whole register when a sub register is defined.
Now the same iterator is used in the rematerilization loop as in the
spill loop later.
Patch provided by Mikael Holmen.
This fix was proposed and reviewed by Quentin Colombet,
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/076135.html.
Unfortunately, this error in the rematerilization code has only been
seen in a large test case for an out-of-tree target, and is probably
hard to reproduce on an in-tree target. Therefore, no testcase is
provided.
llvm-svn: 216873
```
Note: the link to the llvmdev thread mentioned by that message is probably
wrong. I am assuming that this is the correct reference:
http://lists.llvm.org/pipermail/llvm-dev/2014-August/075794.html
It is easy to workaround this issue. For example, we can treat Pseudo
instructions specially: if a pseudo is marked as rematerializable, then we can
probably ignore the register reads check.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191016/16e7bf9a/attachment-0001.html>
More information about the llvm-bugs
mailing list