[PATCH] D13417: [MachineCombiner] make slack optional in critical path cost calculation (PR25016)

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 22 09:01:22 PDT 2015


spatel added a comment.

In http://reviews.llvm.org/D13417#272787, @Gerolf wrote:

> That is still the same test case and the outcome should be independent of compiler revisions.


OK - at least we are certain that we are analyzing the same example now. :)

> W/o reassoc the fill after the last call is not removed, but you get more ILP out of the 3 vadds. W reassoc there is no fill, but less ILP. You can’t get both higher ILP *and* a correct program, since in your proposed faster sequence xmm1 would not be initialized:

> 

>   vaddsd  16(%rsp), %xmm1, %xmm1     <--- c + d

>   vaddsd  (%rsp), %xmm0, %xmm0       <--- a + b              [independent]

>   vaddsd  %xmm0, %xmm1, %xmm0        <--- (a + b) + (c + d)

>    

> 

> What value do you expect in xmm1 in the first vaddsd?


As we can see in the test's CHECK line, xmm1 is loaded just ahead of the 'add' in the case where the reassociation is firing:
; AVX-NEXT: vmovsd 8(%rsp), %xmm1

I don't see how register initialization or load folding is relevant in this discussion. There are 3 loads and 3 adds regardless of reassociation, and one operand is already in a register (xmm0) as the result of the final 'call' instruction. But as you noted earlier, the machine combiner doesn't know about loads/spills, so all operands are in virtual registers as far as it knows.

Perhaps it makes more sense to look at the debug output for the machine combiner pass rather than the final asm?

This is the machine trace metric depth info before reassociation occurs:

  Depths for BB#0:
      0 Instructions
  0       ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0       CALL64pcrel32 <ga:@bar>, <regmask ..., %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1       %vreg0<def> = COPY %XMM0; FR64:%vreg0
  0       ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0       CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1       %vreg1<def> = COPY %XMM0; FR64:%vreg1
  0       ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0       CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1       %vreg2<def> = COPY %XMM0; FR64:%vreg2
  0       ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0       CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1       %vreg3<def> = COPY %XMM0; FR64:%vreg3
  1       %vreg4<def> = VADDSDrr %vreg0, %vreg1; FR64:%vreg4,%vreg0,%vreg1
  1       %vreg5<def> = VADDSDrr %vreg2, %vreg3; FR64:%vreg5,%vreg2,%vreg3
  4       %vreg6<def> = VADDSDrr %vreg4<kill>, %vreg5<kill>; FR64:%vreg6,%vreg4,%vreg5
  7       %XMM0<def> = COPY %vreg6; FR64:%vreg6
  7       RETQ %XMM0

And this is after:

  Depths for BB#0:
      0 Instructions
  0	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0	CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1	%vreg0<def> = COPY %XMM0; FR64:%vreg0
  0	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0	CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1	%vreg1<def> = COPY %XMM0; FR64:%vreg1
  0	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0	CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1	%vreg2<def> = COPY %XMM0; FR64:%vreg2
  0	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  0	CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  1	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  1	%vreg3<def> = COPY %XMM0; FR64:%vreg3
  1	%vreg5<def> = VADDSDrr %vreg2, %vreg3; FR64:%vreg5,%vreg2,%vreg3
  4	%vreg7<def> = VADDSDrr %vreg1, %vreg5<kill>; FR64:%vreg7,%vreg1,%vreg5
  7	%vreg6<def> = VADDSDrr %vreg0, %vreg7<kill>; FR64:%vreg6,%vreg0,%vreg7
  10	%XMM0<def> = COPY %vreg6; FR64:%vreg6
  10	RETQ %XMM0

Do you agree that this 2nd sequence has been de-optimized?

As noted earlier, the reason this happens is because the MTM *Height* information for the original sequence looks like this:

  Heights for BB#0:
     16 Instructions
     3c @ SBPort1 (3 ops x12)
     5c @ SBPort5 (5 ops x12)
     3c @ SBPort05 (5 ops x6)
     4c @ SBPort15 (8 ops x6)
     1c @ SBPort23 (1 ops x6)
     3c @ SBPort015 (8 ops x4)
     2c @ SBPortAny (9 ops x2)
  7       0       RETQ %XMM0
  7       0       %XMM0<def> = COPY %vreg6; FR64:%vreg6
  7       3       %vreg6<def> = VADDSDrr %vreg4<kill>, %vreg5<kill>; FR64:%vreg6,%vreg4,%vreg5
  7       6       %vreg5<def> = VADDSDrr %vreg2, %vreg3; FR64:%vreg5,%vreg2,%vreg3
  7       6       %vreg4<def> = VADDSDrr %vreg0, %vreg1; FR64:%vreg4,%vreg0,%vreg1
  7       6       %vreg3<def> = COPY %XMM0; FR64:%vreg3
  7       0       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  7       7       CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  8       8       ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  8       6       %vreg2<def> = COPY %XMM0; FR64:%vreg2
  10      9       ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  10      10      CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  11      11      ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  11      6       %vreg1<def> = COPY %XMM0; FR64:%vreg1
  13      12      ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  13      13      CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  14      14      ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  14      6       %vreg0<def> = COPY %XMM0; FR64:%vreg0
  16      15      ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  16      16      CALL64pcrel32 <ga:@bar>, <regmask ...>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  17      17      ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  BB#0 Live-ins: SPL at 17
  Critical path: 17


http://reviews.llvm.org/D13417





More information about the llvm-commits mailing list