[PATCH] D45146: [x86] Introduce a pass to begin more systematically fixing PR36028 and similar issues.

Mon Apr 2 20:41:09 PDT 2018

chandlerc marked 12 inline comments as done.
chandlerc added inline comments.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:11-14
+/// Lowers COPY nodes of EFLAGS to directly extract and preserve individual
+/// flag bits so that we don't need to use pushf/popf which have serious
+/// problems due to IF and other flags that should *not* be preserved across
+/// arbitrary instructions in all cases. In addition to these correctness
----------------
rnk wrote:
> I feel like the first one sentence summary of why we need this pass shouldn't include "so we don't need pushf/popf". That presumes the reader already thinks that "popf" is a solution the problem, when it was never actually a viable solution to begin with for all the reasons you outline. The first sentence should state directly that there is no correct or efficient way to copy comparison flags on x86. Then you can go on and explain that the reader might consider popf, lahf, etc, but these are innefficient and incorrect.
Yeah, totally rewrote this. Lemme know what you think.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:101
+  /// our predicate state through the exiting edges.
+  struct BlockCondInfo {
+    MachineBasicBlock *MBB;
----------------
rnk wrote:
> This looks dead, I can't find it in the rest of the pass.
Wow, sorry. Copy-pasted from my other pass.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:403
+      //
+      // FIXME: handle conditional tail calls
+      if (X86::getCondFromBranchOpc(MI.getOpcode()) != X86::COND_INVALID) {
----------------
rnk wrote:
> This might need to be addressed before committing.
Amazingly, I have found no test case that hits this. But yes, we need to handle this.

I'm going to work on crafting a MIR test case to hit this and then will fix it.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:424
+        // We assume that instructions that use flags also def them.
+        assert(MI.findRegisterDefOperand(X86::EFLAGS) &&
+               "Expected a def of EFLAGS for this instruction!");
----------------
rnk wrote:
> Would it be better to assert this and set IsKilled before the rewrite? This is similar to how you set the IsKilled bool above before rewrites.
Yes, thanks.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:462-464
+      MachineBasicBlock &JmpMBB =
+          JmpI == JmpIs.front() ? MBB
+                                : splitBlock(*JmpI->getParent(), *JmpI, *TII);
----------------
rnk wrote:
> `splitBlock` has lots of side effects. This would be clearer as:
>   MachineBasicBlock *JmpMBB = &MBB;
>   if (JmpI != JmpIs.front())
>     JmpMBB = &splitBlock(...);
> 
> It's used as a pointer in the assert anyway.
Ended up rewriting this anyways, and yeah, much clearer.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:470-471
+
+    // FIXME: Mark the last use of EFLAGS before the copy's def as a kill if
+    // the copy's def operand is itself a kill.
+
----------------
rnk wrote:
> I see. Yeah, that is annoying to implement. Don't later passes try to infer this logic? Maybe it's OK to rely on that?
It's OK, just suboptimal. IT will make some things less efficient.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:479-480
+
+  DEBUG(dbgs() << "\nFunction after rewriting copies:\n"; MF.dump();
+        dbgs() << "\n"; MF.verify(this));
+  return true;
----------------
rnk wrote:
> This seems redundant with -print-after=
I find it crazy useful. But sure.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:561
+
+  switch (getMnemonicFromOpcode(MI.getOpcode())) {
+  case FlagArithMnemonic::ADC:
----------------
rnk wrote:
> This seems like the only use of getMnemonicFromOpcode. It seems like you could simplify all that stuff down to just three states:
> 1. non-flags-using arithmetic
> 2. arithmetic using CF
> 3. arithmetic using OF (ADDOX)
> 
> I'm guessing this more detailed classification of opcodes is useful if we want to do sophisticated ADDOX optimizations for multi-precision arithmetic. Do you think it's worth keeping around?
I can simplify it if you really want.

Honestly, having done the simplification, it's a mess to read. The issue is that the macros to do all the silly matching of the crazy number of instruction op codes ends up making it really hard to read the business logic here. So I think just for readability, I'd rather keep the enum and static mapping function.

The use of a switch here is just to force us to update it when/if we add a new mnemonic.

================
Comment at: llvm/lib/Target/X86/X86FlagsCopyLowering.cpp:584-585
+
+  // Insert an instruction that will set the flag back to the desired value.
+  unsigned TmpReg = MRI->createVirtualRegister(PromoteRC);
+  auto AddI =
----------------
rnk wrote:
> Hah, I was scratching my head at this, but I guess at this point during codegen we haven't done two-operand conversions yet. :)
Yeah, hilariously.

Repository:
  rL LLVM

https://reviews.llvm.org/D45146