[PATCH] D15157: CodeGen peephole: fold redundant phys reg copies

Wed Dec 2 09:27:20 PST 2015

jfb added a comment.

There's further complication around calling conventions, where it seems that the x86 backend doesn't fully model `xmm` registers as being caller-saved. This example:

  target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
  target triple = "x86_64-unknown-linux-gnu"

  @I32 = external global i32

  declare i32* @foo(i32*, i64)

  declare i32* @bar(i32, i32)

  declare double @D()

  declare i32* @P32()

  declare i32* @baz(double)

  define hidden i32* @ExpressionFromLiteral(i32 %token) {
  entry:
    switch i32 %token, label %return [
      i32 83, label %bb0
      i32 82, label %bb1
    ]

  bb0:
    %I32.loaded = load i32, i32* @I32
    %call.foo = tail call i32* @foo(i32* nonnull @I32, i64 48)
    %call.bar = tail call i32* @bar(i32 %I32.loaded, i32 %I32.loaded)
    %call.foo.gep.40 = getelementptr inbounds i32, i32* %call.foo, i64 40
    %ptr.foo = bitcast i32* %call.foo.gep.40 to i32**
    store i32* %call.bar, i32** %ptr.foo
    br label %return

  bb1:
    %call.D = tail call double @D()
    %call.P32 = tail call i32* @P32()
    %call.baz = tail call i32* @baz(double %call.D)
    %call.P32.gep.40 = getelementptr inbounds i32, i32* %call.P32, i64 40
    %ptr.P32 = bitcast i32* %call.P32.gep.40 to i32**
    store i32* %call.baz, i32** %ptr.P32
    br label %return

  return:
    %retval.0 = phi i32* [ %call.P32, %bb1 ], [ %call.foo, %bb0 ], [ null, %entry ]
    ret i32* %retval.0
  }

Confuses LLVM around the `@D` call because the `double` is returned in `xmm0` and is live across a call, and passed to the next one. The code starts off as:

  BB#2: derived from LLVM BB %bb1
      Predecessors according to CFG: BB#0
  	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	CALL64pcrel32 <ga:@D>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %RSP<imp-def>, %XMM0<imp-def>
  	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	%vreg8<def> = COPY %XMM0; FR64:%vreg8
  	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	CALL64pcrel32 <ga:@P32>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %RSP<imp-def>, %RAX<imp-def>
  	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	%vreg9<def> = COPY %RAX; GR64:%vreg9
  	%vreg1<def> = COPY %vreg9; GR64:%vreg1,%vreg9
  	ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	%XMM0<def> = COPY %vreg8; FR64:%vreg8
  	CALL64pcrel32 <ga:@baz>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %XMM0<imp-use>, %RSP<imp-def>, %RAX<imp-def>
  	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
  	%vreg10<def> = COPY %RAX; GR64:%vreg10
  	MOV64mr %vreg9, 1, %noreg, 160, %noreg, %vreg10; mem:ST8[%ptr.P32] GR64:%vreg9,%vreg10
      Successors according to CFG: BB#3(?%)

Note how `xmm0` is implicitly defined, then (with available information) needlessly copied to `vreg8` and back, then implicitly used. My patch deletes `%XMM0<def> = COPY %vreg8`, which then allows `%vreg8<def> = COPY %XMM0` to be deleted.

This could be fixed by being either:

- Being smarter about calling conventions in my code.
- Marking all calls as having modeled side-effects (which would lose some potential optimizations).
- Marking `xmm` registers as clobbered in calls such as `@P32` above.

That problem doesn't really need to be tackled until I figure out how to fix the other one, though. It's just extra :-)

http://reviews.llvm.org/D15157