<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Yes, I've run into the problem myself that the Pending queue isn't even checked with the tryCandidate() logic and so takes priority over all other scheduling decisions.</div><div class=""><br class=""></div><div class="">I personally would be open to changes in this area. To start the brainstorming I could imagine that we move nodes below a target specific limit into the available queue instead of just when they hit their latency cycle limits. And then let tryCandidate() weight the remaining cycles against other scheduling criteria.</div><div class=""><br class=""></div><div class="">Also keep in mind:</div><div class="">- When making those changes be careful getting cycle bumping/simulation logic right</div><div class="">- The pending queue is also used as a mechanism to keep compile time (see ReadyListLimit) in check (as checking every candidate for every instruction we schedule is O(n**2)).</div><div class=""><br class=""></div><div class="">- Matthias</div><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Oct 13, 2017, at 1:09 PM, Stefan Pintilie via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><font size="2" face="sans-serif" class="">Hi, </font><br class=""><br class=""><font size="2" face="sans-serif" class="">I've been looking at the Machine Scheduler

on Power PC.  I am looking only at the pre-RA machine scheduler and

I am running it in the default bi-directional mode (so, both top down and

bottom up queues are considered). I've come across an example where the

scheduler picks a poor ordering for the instructions which results in very

high register pressure which results in spills. The problem comes from

the fact that the Machine Scheduler uses a maximum latency limit when it

considers instructions to schedule. A high latency instruction will not

be scheduled before all of the available lower latency instructions are

scheduled. This happens regardless of register pressure since the higher

latency instruction is not even added to the "Available" queue

that is used when the heuristics pick an instruction to schedule next.

</font><br class=""><br class=""><font size="2" face="sans-serif" class="">My question is: Why do we have that

latency limit in the first place? If an instruction can be scheduled (ie

all the instructions it depends on are already scheduled) shouldn't it

be at least considered?</font><br class=""><br class=""><br class=""><br class=""><font size="2" face="sans-serif" class="">The example is listed below:</font><br class=""><br class=""><font size="2" face="sans-serif" class="">test.c</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><font size="2" face="sans-serif" class="">long A[100];</font><br class=""><font size="2" face="sans-serif" class="">long func(long* num, long* den) {</font><br class=""><font size="2" face="sans-serif" class="">// This loop is unrolled</font><br class=""><font size="2" face="sans-serif" class="">for (int i=0; i<6; i++) {</font><br class=""><font size="2" face="sans-serif" class="">  A[i] = num[i] / den[i];</font><br class=""><font size="2" face="sans-serif" class="">}</font><br class=""><font size="2" face="sans-serif" class="">return 0;</font><br class=""><font size="2" face="sans-serif" class="">}</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><br class=""><font size="2" face="sans-serif" class="">Compile commands</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><font size="2" face="sans-serif" class="">clang -c -m64 -O3 -target powerpc64le-unknown-linux-gnu

-mcpu=pwr9 -fexperimental-new-pass-manager test.c -S -emit-llvm</font><br class=""><font size="2" face="sans-serif" class="">llc test.ll -O3 -ppc-asm-full-reg-names

-debug-only=machine-scheduler -o test-p9.s > listing.out 2>&1</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><br class=""><font size="2" face="sans-serif" class="">Looking at the listing.out file I've

noticed that all of the loads are grouped together at the start of the

function. Those loads use 12 registers before any of the divides are scheduled.

As a result, we end up with significantly higher register pressure after

all the loads.</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><font size="2" face="sans-serif" class="">0B      BB#0: derived

from LLVM BB %entry</font><br class=""><font size="2" face="sans-serif" class="">           

Live Ins: %X3 %X4</font><br class=""><font size="2" face="sans-serif" class="">16B          

  %vreg1<def> = COPY %X4; G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">32B          

  %vreg0<def> = COPY %X3; G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">48B          

  %vreg2<def> = LD 0, %vreg0; mem:LD8[%num](tbaa=!4) G8RC:%vreg2

G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">64B          

  %vreg3<def> = LD 0, %vreg1; mem:LD8[%den](tbaa=!4) G8RC:%vreg3

G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">144B          

 %vreg7<def> = LD 8, %vreg0; mem:LD8[%arrayidx.1](tbaa=!4) G8RC:%vreg7

G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">160B          

 %vreg8<def> = LD 8, %vreg1; mem:LD8[%arrayidx2.1](tbaa=!4)

G8RC:%vreg8 G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">208B          

 %vreg10<def> = LD 16, %vreg0; mem:LD8[%arrayidx.2](tbaa=!4)

G8RC:%vreg10 G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">224B          

 %vreg11<def> = LD 16, %vreg1; mem:LD8[%arrayidx2.2](tbaa=!4)

G8RC:%vreg11 G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">272B          

 %vreg13<def> = LD 24, %vreg0; mem:LD8[%arrayidx.3](tbaa=!4)

G8RC:%vreg13 G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">288B          

 %vreg14<def> = LD 24, %vreg1; mem:LD8[%arrayidx2.3](tbaa=!4)

G8RC:%vreg14 G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">336B          

 %vreg16<def> = LD 32, %vreg0; mem:LD8[%arrayidx.4](tbaa=!4)

G8RC:%vreg16 G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">352B          

 %vreg17<def> = LD 32, %vreg1; mem:LD8[%arrayidx2.4](tbaa=!4)

G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">400B          

 %vreg19<def> = LD 40, %vreg0; mem:LD8[%arrayidx.5](tbaa=!4)

G8RC:%vreg19 G8RC_and_G8RC_NOX0:%vreg0</font><br class=""><font size="2" face="sans-serif" class="">416B          

 %vreg20<def> = LD 40, %vreg1; mem:LD8[%arrayidx2.5](tbaa=!4)

G8RC:%vreg20 G8RC_and_G8RC_NOX0:%vreg1</font><br class=""><font size="2" face="sans-serif" class="">424B          

 %vreg4<def> = DIVD %vreg2, %vreg3; G8RC:%vreg4,%vreg2,%vreg3</font><br class=""><font size="2" face="sans-serif" class="">432B          

 %vreg9<def> = DIVD %vreg7, %vreg8; G8RC:%vreg9,%vreg7,%vreg8</font><br class=""><font size="2" face="sans-serif" class="">440B          

 %vreg12<def> = DIVD %vreg10, %vreg11; G8RC:%vreg12,%vreg10,%vreg11</font><br class=""><font size="2" face="sans-serif" class="">448B          

 %vreg15<def> = DIVD %vreg13, %vreg14; G8RC:%vreg15,%vreg13,%vreg14</font><br class=""><font size="2" face="sans-serif" class="">456B          

 %vreg18<def> = DIVD %vreg16, %vreg17; G8RC:%vreg18,%vreg16,%vreg17</font><br class=""><font size="2" face="sans-serif" class="">464B          

 %vreg21<def> = DIVD %vreg19, %vreg20; G8RC:%vreg21,%vreg19,%vreg20</font><br class=""><font size="2" face="sans-serif" class="">472B          

 %vreg5<def> = ADDIStocHA %X2, <ga:@A>; G8RC_and_G8RC_NOX0:%vreg5</font><br class=""><font size="2" face="sans-serif" class="">480B          

 %vreg6<def> = LDtocL <ga:@A>, %vreg5, %X2<imp-use>;

mem:LD8[GOT] G8RC_and_G8RC_NOX0:%vreg6,%vreg5</font><br class=""><font size="2" face="sans-serif" class="">504B          

 %X3<def> = LI8 0</font><br class=""><font size="2" face="sans-serif" class="">512B          

 STD %vreg4, 0, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 0)](tbaa=!4) G8RC:%vreg4 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">520B          

 STD %vreg9, 8, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 1)](tbaa=!4) G8RC:%vreg9 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">528B          

 STD %vreg12, 16, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 2)](tbaa=!4) G8RC:%vreg12 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">536B          

 STD %vreg15, 24, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 3)](tbaa=!4) G8RC:%vreg15 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">544B          

 STD %vreg18, 32, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 4)](tbaa=!4) G8RC:%vreg18 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">552B          

 STD %vreg21, 40, %vreg6; mem:ST8[getelementptr inbounds ([100 x i64],

[100 x i64]* @A, i64 0, i64 5)](tbaa=!4) G8RC:%vreg21 G8RC_and_G8RC_NOX0:%vreg6</font><br class=""><font size="2" face="sans-serif" class="">560B          

 BLR8 %LR8<imp-use>, %RM<imp-use>, %X3<imp-use></font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><br class=""><font size="2" face="sans-serif" class="">Due to all of the register pressure

built up by those loads we are forced to spill. Here is the final assembly.</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><font size="2" face="sans-serif" class=""># BB#0:        

  # %entry</font><br class=""><font size="2" face="sans-serif" class="">        std r30,

-16(r1)                # 8-byte

Folded Spill</font><br class=""><font size="2" face="sans-serif" class="">        ld r5, 0(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r6, 0(r4)</font><br class=""><font size="2" face="sans-serif" class="">        ld r7, 8(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r8, 8(r4)</font><br class=""><font size="2" face="sans-serif" class="">        ld r9, 16(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r10,

16(r4)</font><br class=""><font size="2" face="sans-serif" class="">        ld r11,

24(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r0, 32(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r12,

24(r4)</font><br class=""><font size="2" face="sans-serif" class="">        ld r30,

32(r4)</font><br class=""><font size="2" face="sans-serif" class="">        ld r3, 40(r3)</font><br class=""><font size="2" face="sans-serif" class="">        ld r4, 40(r4)</font><br class=""><font size="2" face="sans-serif" class="">        divd r5,

r5, r6</font><br class=""><font size="2" face="sans-serif" class="">        divd r6,

r7, r8</font><br class=""><font size="2" face="sans-serif" class="">        divd r7,

r9, r10</font><br class=""><font size="2" face="sans-serif" class="">        divd r9,

r0, r30</font><br class=""><font size="2" face="sans-serif" class="">        divd r4,

r3, r4</font><br class=""><font size="2" face="sans-serif" class="">        divd r8,

r11, r12</font><br class=""><font size="2" face="sans-serif" class="">        addis r3,

r2, .LC0@toc@ha</font><br class=""><font size="2" face="sans-serif" class="">        ld r30,

-16(r1)                 # 8-byte

Folded Reload</font><br class=""><font size="2" face="sans-serif" class="">        ld r10,

.LC0@toc@l(r3)</font><br class=""><font size="2" face="sans-serif" class="">        li r3, 0</font><br class=""><font size="2" face="sans-serif" class="">        std r5,

0(r10)</font><br class=""><font size="2" face="sans-serif" class="">        std r6,

8(r10)</font><br class=""><font size="2" face="sans-serif" class="">        std r7,

16(r10)</font><br class=""><font size="2" face="sans-serif" class="">        std r9,

32(r10)</font><br class=""><font size="2" face="sans-serif" class="">        std r8,

24(r10)</font><br class=""><font size="2" face="sans-serif" class="">        std r4,

40(r10)</font><br class=""><font size="2" face="sans-serif" class="">        blr</font><br class=""><font size="2" face="sans-serif" class="">--</font><br class=""><br class=""><font size="2" face="sans-serif" class="">Thank you, </font><br class=""><font size="2" face="sans-serif" class="">Stefan Pintilie</font><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></div></blockquote></div><br class=""></body></html>