<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW " title="NEW --- - greedy regalloc -- an unoptimal region split case" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D24278&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=nqT6HqBM8_ahjT1wsS9y_aXhV_I5-MDyqvkHKglxw5Y&s=pNUkNoPvCqYuJ-c_QKxM1HuSXropdWjwHsP12czD4sE&e=">24278</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>greedy regalloc -- an unoptimal region split case
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Register Allocator
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>wmi@google.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=14653" name="attach_14653" title="testcase 1.cxx">attachment 14653</a> <a href="attachment.cgi?id=14653&action=edit" title="testcase 1.cxx">[details]</a></span>
testcase 1.cxx
For the testcase 1.cxx attached.
~/llvm-r240893/build/bin/clang++ -fno-omit-frame-pointer -O2 -S 1.cxx -o 1.s
The kernel loop in 1.s:
.LBB2_23: # %for.body
# =>This Inner Loop Header: Depth=1
movl -92(%rbp), %ebx
movq %r14, %r15 <================
movq (%r15), %r14
leaq (%r14,%rbx,8), %r13
movq %r13, %rdi
callq _ZNK1H5m_fn1Ev
cltq
movq (%r14,%rbx,8), %rcx
movq %r15, %r14 <================
movq (%rcx), %rcx
movl (%rcx,%rax,4), %ebx
movq %r13, %rdi
callq _ZN1H5m_fn2Ev
movl %ebx, 8(%r14)
decl -92(%rbp)
leaq -88(%rbp), %rdi
movq %r12, %rsi
callq _ZNK1A5m_fn1EPj
testl %eax, %eax
jne .LBB2_23
Two movq instructions in the loop above can be moved outside of the loop.
Better code generated for the loop is like this:
movq %r14, %r15
je .LBB2_24
...
.LBB2_23: # %for.body
# =>This Inner Loop Header: Depth=1
movl -92(%rbp), %ebx
movq (%r15), %r14
leaq (%r14,%rbx,8), %r13
movq %r13, %rdi
callq _ZNK1H5m_fn1Ev
cltq
movq (%r14,%rbx,8), %rcx
movq (%rcx), %rcx
movl (%rcx,%rax,4), %ebx
movq %r13, %rdi
callq _ZN1H5m_fn2Ev
movl %ebx, 8(%r15)
decl -92(%rbp)
leaq -88(%rbp), %rdi
movq %r12, %rsi
callq _ZNK1A5m_fn1EPj
testl %eax, %eax
jne .LBB2_23
Here is the IR before RegAlloc pass:
4352B BB#28: derived from LLVM BB %for.body
Predecessors according to CFG: BB#28 BB#22
4368B %vreg75:sub_32bit<def,read-undef> = MOV32rm <fi#7>, 1, %noreg,
0, %noreg; mem:LD4[%i](tbaa=!10) GR64_NOSP:%vreg75
4400B %vreg76<def> = MOV64rm %vreg22, 1, %noreg, 0, %noreg;
mem:LD8[%16](tbaa=!3) GR64:%vreg76,%vreg22
4416B %vreg77<def> = LEA64r %vreg76, 8, %vreg75, 0, %noreg;
GR64:%vreg77,%vreg76 GR64_NOSP:%vreg75
4432B ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>,
%RSP<imp-use>
4448B %RDI<def> = COPY %vreg77; GR64:%vreg77
4464B CALL64pcrel32 <ga:@_ZNK1H5m_fn1Ev>, <regmask>, %RSP<imp-use>,
%RDI<imp-use>, %RSP<imp-def>, %EAX<imp-def>
4480B ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>,
%RSP<imp-use>
4496B %vreg78<def> = COPY %EAX; GR32:%vreg78
4512B %vreg79<def> = MOVSX64rr32 %vreg78; GR64_NOSP:%vreg79
GR32:%vreg78
4528B %vreg80<def> = MOV64rm %vreg76, 8, %vreg75, 0, %noreg;
mem:LD8[%common_.i.i.i.5362](tbaa=!2) GR64:%vreg80,%vreg76 GR64_NOSP:%vreg75
4544B %vreg81<def> = MOV64rm %vreg80, 1, %noreg, 0, %noreg;
mem:LD8[%cluster_state.i.i.i.5463](tbaa=!7) GR64:%vreg81,%vreg80
4560B %vreg82<def> = MOV32rm %vreg81, 4, %vreg79, 0, %noreg;
mem:LD4[%log_odds_other_children.i.i.55](tbaa=!9) GR32:%vreg82 GR64:%vreg81
GR64_NOSP:%vreg79
4576B ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>,
%RSP<imp-use>
4592B %RDI<def> = COPY %vreg77; GR64:%vreg77
4608B CALL64pcrel32 <ga:@_ZN1H5m_fn2Ev>, <regmask>, %RSP<imp-use>,
%RDI<imp-use>, %RSP<imp-def>, %EAX<imp-def,dead>
4624B ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>,
%RSP<imp-use>
*** dbg trace: ***
assigning %vreg76 to %R14: R14B [4400r,4528r:0) 0@4400r
selectOrSplit GR64:%vreg22 [16r,3872B:0)[4352B,4816B:0) 0@16r w=9.071316e-03
Split for %R14 in 8 bundles, intv 1.
splitAroundRegion with 2 globals.
queuing new interval: %vreg90 [4376r,4536r:0) 0@4376r
queuing new interval: %vreg91 [16r,3872B:0)[4352B,4376r:1)[4536r,4816B:2)
0@16r 1@4352B-phi 2@4536r
******************
In the dbg trace of Greedy regalloc pass, vreg76 is assigned to r14 before the
reg selection of vreg22. vreg22 cannot get a valid Physreg and is
region-splitted. The interval of vreg22 is splitted into two (vreg90 and
vreg91) for the GlobalSplitCandidate whose PhysReg is r14.
Both of the new intervals of vreg90 and vreg91 only cover parts of the kernel
loop, so the region split points of the original interval are inside the loop,
which leads to the two movq instructions inside the loop.
In BB28, the range between the first use instruction and the last use
instruction of vreg22 [4352B,4816B:0) includes the interference range of vreg76
[4400r,4528r:0). Because of the logic in RAGreedy::addSplitConstraints, the
split constraint at the entry and exit of BB28 are set to
SpillPlacement::PrefReg. This is a bad choice because there is r14 interference
inside BB28, if the entry and exit of BB28 are set to PrefReg, there will be
region splits inside BB28 anyway.
If the constraints at the entry and exit of BB28 are set to
SpillPlacement::PrefSpill in this case, the region split points will be outside
the loop.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>