[PATCH] x86 NaCl: Emit LAHF/SAHF instead of PUSHF/POPF

Mon Dec 15 15:41:38 PST 2014

In http://reviews.llvm.org/D6629#101238, @t.p.northover wrote:

> OK, I'm afraid I have a few more questions:
>
> - What happens for "EAX = COPY EFLAGS"? (It's not good).

You mean if the user actually wants to copy `EFLAGS` for system code? Indeed that would be broken. Currently the test's code for `test_intervening_call` sees the following after ISel lowering:

  *** MachineFunction at end of ISel ***
  # Machine code for function test_intervening_call: SSA
  Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

  BB#0: derived from LLVM BB %0
      Live Ins: %EDI %RSI %RDX
  	%vreg2<def> = COPY %RDX; GR64:%vreg2
  	%vreg1<def> = COPY %RSI; GR64:%vreg1
  	%vreg0<def> = COPY %EDI; GR32:%vreg0
  	%RAX<def> = COPY %vreg1; GR64:%vreg1
  	LCMPXCHG64 %vreg0, 1, %noreg, 0, %noreg, %vreg2, %RAX<imp-def>, %EFLAGS<imp-def>, %RAX<imp-use>; mem:Volatile LDST8[%foo] GR32:%vreg0 GR64:%vreg2
  	%vreg3<def> = COPY %RAX; GR64:%vreg3
  	%vreg4<def> = COPY %EFLAGS; GR64:%vreg4
  	ADJCALLSTACKDOWN32 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
  	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def>
  	ADJCALLSTACKUP32 0, 0, %ESP<imp-def,dead>, %EFLAGS<imp-def,dead>, %ESP<imp-use>
  	%vreg5<def> = COPY %EAX; GR32:%vreg5
  	%EFLAGS<def> = COPY %vreg4; GR64:%vreg4
  	JNE_4 <BB#2>, %EFLAGS<imp-use>
  	JMP_4 <BB#1>
      Successors according to CFG: BB#1(16) BB#2(16)

AFAICT the code I wrote doesn't have a way to distinguish between the user asking for a true copy of `EFLAGS`, and ISel deciding that it needed to save/restore flags and deciding that `EFLAGS` was the way to go. This isn't an issue for NaCl because there isn't such a thing as copying `EFLAGS`. For system-mode LLVM trying to compile e.g. Linux that could be an issue, though I assume that this is done with inline assembly and not IR. Maybe ISel lowering should be changed? What do you think?

> - I'm not sure what's available here, but there are ways to check liveness too, which would allow the push/pop to be skipped entirely if AX is dead.

I'm not very familiar with this code, guidance appreciated. That looks like something PeepholeOptimizer.cpp should do (like it does with `X86InstrInfo::optimizeCompareInstr` and similar), but again ISel doesn't give it quite the information it needs to figure it out: we'd need to call out individual flags that are live instead, and for `LAHF`/`SAHF` + `SETO` for the general case, and depending on the subset of flags live otherwise we could do a simple `SET<cc>`. GCC manages to do this, and the repro that I posted on the original PR is much cleaner in GCC's case because is realizes that the flag it care about has already been materialized.

Do we care to do this, though? This code shouldn't get emitted often, so is it worth optimizing/testing/maintaining?

> - We probably want to be more careful with the flags on the instructions (at the least LAHF & SAHF should be marked with EFLAGS I think, probably we should get the kill states right too).

I think that's already the case, unless I misunderstand what you want. See  `test_intervening_call`  after post-RA expansion (when `LAHF`/`SAHF` get added):

  # *** IR Dump After Post-RA pseudo instruction expansion pass ***:
  # Machine code for function test_intervening_call: Post SSA
  Frame Objects:
    fi#-1: size=8, align=16, fixed, at location [SP-8]
  Function Live Ins: %EDI in %vreg0, %RSI in %vreg1, %RDX in %vreg2

  BB#0: derived from LLVM BB %0
      Live Ins: %EDI %RSI %RDX %RBX
  	PUSH64r %RBX<kill>, %RSP<imp-def>, %RSP<imp-use>; flags: FrameSetup
  	CFI_INSTRUCTION <call frame instruction>
  	CFI_INSTRUCTION <call frame instruction>
  	%RAX<def> = MOV64rr %RSI<kill>
  	LCMPXCHG64 %EDI<kill>, 1, %noreg, 0, %noreg, %RDX<kill>, %RAX<imp-def,dead>, %EFLAGS<imp-def>, %RAX<imp-use,kill>; mem:Volatile LDST8[%foo]
  	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
  	%RAX<def> = SETOr %EFLAGS<imp-use>
  	LAHF %AH<imp-def>, %EFLAGS<imp-use>
  	%RBX<def> = MOV64rr %RAX
  	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
  	CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>, %EAX<imp-def,dead>
  	PUSH64r %RAX, %RSP<imp-def>, %RSP<imp-use>
  	%RAX<def> = MOV64rr %RBX<kill>
  	%AL<def,tied1> = ADD8ri %AL<tied0>, 127, %EFLAGS<imp-def>
  	SAHF %EFLAGS<imp-def>, %AH<imp-use>
  	%RAX<def> = POP64r %RSP<imp-def>, %RSP<imp-use>
  	JNE_4 <BB#2>, %EFLAGS<imp-use>
      Successors according to CFG: BB#1(16) BB#2(16)

> - With all this complexity, it might be time for a helper function.

You mean: I should pull the `EFLAGS` handling out of `X86InstrInfo::copyPhysReg`? I can do that, I just want to make sure that's what you're suggesting.

> And while we're at it, I think it'd be good to have a single code sequence too. Unless this dance turns out to be slower than pushf/popf (not impossible, even if those are slow).

That'll depend on the answer to your above question: does this break system code that builds with LLVM? I don't think it does (since it probably uses inline assembly). If that's the case then I can craft a benchmark and see what the cost is, which should give us the data needed to decide what to do.

http://reviews.llvm.org/D6629

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/