Fwd: [PATCH] CSE removes COPY.

Thu May 29 00:03:52 PDT 2014

Hi all!
I'm sorry this is my first code review, I did not find info in manuals...
1) Does anybody know have Phabricator the tag like <noformat> or {code} in JIRA not performing formatting?
2) How can I get patch using arcanist (arc diff) with available context? I think it will be automatically... (((

Hi atrick,

1) The reason why CSE ignored COPY is PerformTrivialCoalescing.
If COPY insert into HASH and perform PerformTrivialCoalescing: MI->eraseFromParent()
HASH will be broken. Main idea is to perform PerformTrivialCoalescing before insertion.
**************************************************************************
2) cross-regclass copy:
We develop backend for new architecture with independent GPR and ADDRRegs classes.
In this case we have redundant MOVs(between independent disjoint reg classes):
move r0.l, a0.l <<<<<<<<<<<<
store r2.d, (a0.l)
move r0.l, a0.l <<<<<<<<<<<<
store r4.d, (a0.l+8)
Now CSE removes this type of MOV.
**************************************************************************
3) subreg copy: CodeGen\X86\cse-add-with-overflow.ll passed
because we got CSE for second "add" and correspondent COPYs
and "Simple Register Coalescing" performed coalescing for
first "add" and correspondent COPYs (This scheme seems natural,
so I removed FIXME in PerformTrivialCoalescing):
//--------   dump using patch: -print-after-all:
# *** IR Dump After Machine Loop Invariant Code Motion ***:
# Machine code for function redundantadd: SSA
Function Live Ins: %RDI in %vreg2, %RSI in %vreg3

BB#0: derived from LLVM BB %entry
    Live Ins: %RDI %RSI
%vreg3<def> = COPY %RSI; GR64:%vreg3
%vreg2<def> = COPY %RDI; GR64:%vreg2
%vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2
%vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3
%vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0
%vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1
%vreg6<def,tied1> = ADD32rr %vreg4<tied0>, %vreg5<kill>, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4,%vreg5
JNO_4 <BB#2>, %EFLAGS<imp-use>
JMP_4 <BB#1>
    Successors according to CFG: BB#1(1) BB#2(1048575)

BB#1: derived from LLVM BB %exit2
    Predecessors according to CFG: BB#0

BB#2: derived from LLVM BB %return
    Predecessors according to CFG: BB#0
>>>>>>>>>	%vreg7<def> = COPY %vreg0:sub_32bit; GR32:%vreg7 GR64:%vreg0
>>>>>>>>>	%vreg8<def> = COPY %vreg1:sub_32bit; GR32:%vreg8 GR64:%vreg1
>>>>>>>>>	%vreg9<def,tied1> = ADD32rr %vreg8<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8,%vreg7
%vreg10<def> = SUBREG_TO_REG 0, %vreg9<kill>, 4; GR64:%vreg10 GR32:%vreg9
%RAX<def> = COPY %vreg10; GR64:%vreg10
RETQ %RAX

# End machine code for function redundantadd.

# *** IR Dump After Machine Common Subexpression Elimination ***:
# Machine code for function redundantadd: SSA
Function Live Ins: %RDI in %vreg2, %RSI in %vreg3

BB#0: derived from LLVM BB %entry
    Live Ins: %RDI %RSI
%vreg3<def> = COPY %RSI; GR64:%vreg3
%vreg2<def> = COPY %RDI; GR64:%vreg2
%vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2
%vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3
%vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0
%vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1
%vreg6<def,tied1> = ADD32rr %vreg4<tied0>, %vreg5, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4,%vreg5
JNO_4 <BB#2>, %EFLAGS<imp-use>
JMP_4 <BB#1>
    Successors according to CFG: BB#1(1) BB#2(1048575)

BB#1: derived from LLVM BB %exit2
    Predecessors according to CFG: BB#0

BB#2: derived from LLVM BB %return
    Predecessors according to CFG: BB#0
%vreg10<def> = SUBREG_TO_REG 0, %vreg6, 4; GR64:%vreg10 GR32:%vreg6
%RAX<def> = COPY %vreg10; GR64:%vreg10
RETQ %RAX

# End machine code for function redundantadd.
.........................................
.........................................
# *** IR Dump After Live Interval Analysis ***:
# Machine code for function redundantadd: Post SSA
Function Live Ins: %RDI in %vreg2, %RSI in %vreg3

0B	BB#0: derived from LLVM BB %entry
Live Ins: %RDI %RSI
16B		%vreg3<def> = COPY %RSI; GR64:%vreg3
32B		%vreg2<def> = COPY %RDI; GR64:%vreg2
48B		%vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64:%vreg0,%vreg2
64B		%vreg1<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64:%vreg1,%vreg3
>>>>>>>>>>80B		%vreg4<def> = COPY %vreg0:sub_32bit; GR32:%vreg4 GR64:%vreg0
>>>>>>>>>>96B		%vreg5<def> = COPY %vreg1:sub_32bit; GR32:%vreg5 GR64:%vreg1
>>>>>>>>>>112B		%vreg6<def> = COPY %vreg5; GR32:%vreg6,%vreg5
128B		%vreg6<def,tied1> = ADD32rr %vreg6<tied0>, %vreg4, %EFLAGS<imp-def>; GR32:%vreg6,%vreg4
144B		JNO_4 <BB#2>, %EFLAGS<imp-use,kill>
160B		JMP_4 <BB#1>
Successors according to CFG: BB#1(1) BB#2(1048575)

176B	BB#1: derived from LLVM BB %exit2
Predecessors according to CFG: BB#0

192B	BB#2: derived from LLVM BB %return
Predecessors according to CFG: BB#0
208B		%vreg10<def> = SUBREG_TO_REG 0, %vreg6, 4; GR64:%vreg10 GR32:%vreg6
224B		%RAX<def> = COPY %vreg10; GR64:%vreg10
240B		RETQ %RAX<kill>

# End machine code for function redundantadd.

# *** IR Dump After Simple Register Coalescing ***:
# Machine code for function redundantadd: Post SSA
Function Live Ins: %RDI in %vreg2, %RSI in %vreg3

0B	BB#0: derived from LLVM BB %entry
Live Ins: %RDI %RSI
16B		%vreg3<def> = COPY %RSI; GR64:%vreg3
32B		%vreg2<def> = COPY %RDI; GR64:%vreg2
48B		%vreg0<def> = MOV64rm %vreg2, 1, %noreg, 0, %noreg; mem:LD8[%a0] GR64_with_sub_8bit:%vreg0 GR64:%vreg2
64B		%vreg10<def> = MOV64rm %vreg3, 1, %noreg, 0, %noreg; mem:LD8[%a1] GR64_with_sub_8bit:%vreg10 GR64:%vreg3
128B		%vreg10:sub_32bit<def,tied1> = ADD32rr %vreg10:sub_32bit<tied0>, %vreg0:sub_32bit, %EFLAGS<imp-def>; GR64_with_sub_8bit:%vreg10,%vreg0
144B		JNO_4 <BB#2>, %EFLAGS<imp-use,kill>
160B		JMP_4 <BB#1>
Successors according to CFG: BB#1(1) BB#2(1048575)

176B	BB#1: derived from LLVM BB %exit2
Predecessors according to CFG: BB#0

192B	BB#2: derived from LLVM BB %return
Predecessors according to CFG: BB#0
224B		%RAX<def> = COPY %vreg10; GR64_with_sub_8bit:%vreg10
240B		RETQ %RAX<kill>

# End machine code for function redundantadd.

//--------
Also I modified cse-add-with-overflow.ll to minimize test.
**************************************************************************
4) problem with CodeGen/X86/inline-asm-fpstack.ll:
//--------   dump: -print-after-all:
# *** IR Dump After Machine Loop Invariant Code Motion ***:
# Machine code for function testPR4185b: Post SSA
Constant Pool:
  cp#0: 1.000000e+06, align=4

BB#0: derived from LLVM BB %return
%FP0<def> = LD_Fp32m80 %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool]
>>>>>>>	%ST0<def> = COPY %FP0
INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0
>>>>>>>	%ST0<def> = COPY %FP0<kill>
INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def>
RETL

# End machine code for function testPR4185b.

# *** IR Dump After Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function testPR4185b: Post SSA
Constant Pool:
  cp#0: 1.000000e+06, align=4

BB#0: derived from LLVM BB %return
LD_F32m %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool]
INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0
INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def>
RETL

# End machine code for function testPR4185b.

//--------   dump using patch:
# *** IR Dump After Machine Loop Invariant Code Motion ***:
# Machine code for function testPR4185b: Post SSA
Constant Pool:
  cp#0: 1.000000e+06, align=4

BB#0: derived from LLVM BB %return
%FP0<def> = LD_Fp32m80 %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool]
>>>>>>>>>	%ST0<def> = COPY %FP0<kill>
INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0
INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def>
RETL

# End machine code for function testPR4185b.

# *** IR Dump After Prologue/Epilogue Insertion & Frame Finalization ***:
# Machine code for function testPR4185b: Post SSA
Constant Pool:
  cp#0: 1.000000e+06, align=4

BB#0: derived from LLVM BB %return
LD_F32m %noreg, 1, %noreg, <cp#0>, %noreg, %FPSW<imp-def,dead>; mem:LD4[ConstantPool]
INLINEASM <es:fistl $0> [sideeffect] [attdialect], $0:[reguse], %ST0
>>>>>>>>???	ST_FPrr %ST0, %FPSW<imp-def>
>>>>>>>>???	LD_F0 %FPSW<imp-def>
INLINEASM <es:fistpl $0> [sideeffect] [attdialect], $0:[reguse], %ST0, $1:[clobber], %ST0<earlyclobber,imp-def>
RETL

# End machine code for function testPR4185b.

//--------
I'm looking for why "Prologue/Epilogue Insertion" include
ST_FPrr %ST0, %FPSW<imp-def>
LD_F0 %FPSW<imp-def>
asap. For now we have option -cse-ignore-copy with correspondent FIXME.

http://reviews.llvm.org/D3948

Files:
  lib/CodeGen/MachineCSE.cpp
  test/CodeGen/ARM/atomic-64bit.ll
  test/CodeGen/ARM/debug-info-branch-folding.ll
  test/CodeGen/X86/cse-add-with-overflow.ll
  test/CodeGen/X86/inline-asm-fpstack.ll

----------------------------------------------------------------------

-- 
Данил Трошков
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140529/8dd0e638/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D3948.9907.patch
Type: text/x-patch
Size: 8370 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140529/8dd0e638/attachment.bin>