[PATCH] D71841: [FPEnv] Default NoFPExcept SDNodeFlag to false

Tue Dec 24 05:19:11 PST 2019

uweigand added a comment.

I've had a closer look at the vec-strict-fptoint-256.ll changes I mention here:

  In one case this led to codegen changes (different register allocation) since some copy propagation is no longer run on instructions marked as fpexcept.

It turns out there's actually another problem.  It is true that the first codegen change happens in MachineCSE, where code before this patch would coalease a few copies via MachineCSE::PerformTrivialCopyPropagation, which code after this patch no longer does.  (This is because MachineCSE only propagates copies **into** instructions that are considered CSE candidates, and MI instructions that may raise FP exceptions no longer are considered such candidates.)

However, that change in itself would actually be harmless.  Even with those copies still there, the register allocator would still coalesce them itself, resulting in the same generated machine code.  However, there is another problem in between, in a X86 target-specific pass: X86FixupSetCCPass.  Before my patch, this pass would eliminate a zero-extend after a SETCC, after my patch, it no longer does -- and **this** is what causes the visible machine code differences.

The reason for this difference is here:

  // We expect the instruction *immediately* before the setcc to imp-def
  // EFLAGS (because of scheduling glue). To make this less brittle w.r.t
  // scheduling, look backwards until we hit the beginning of the
  // basic-block, or a small bound (to avoid quadratic behavior).
  MachineInstr *
  X86FixupSetCCPass::findFlagsImpDef(MachineBasicBlock *MBB,
                                     MachineBasicBlock::reverse_iterator MI) {
    // FIXME: Should this be instr_rend(), and MI be reverse_instr_iterator?
    auto MBBStart = MBB->rend();
    for (int i = 0; (i < SearchBound) && (MI != MBBStart); ++i, ++MI)
      for (auto &Op : MI->implicit_operands())
        if (Op.isReg() && (Op.getReg() == X86::EFLAGS) && Op.isDef())
          return &*MI;

    return nullptr;
  }

where SearchBound is a constant (16).   The problem is now that simply due to the presence of an extra COPY, the distance between the eflags defining instruction and the SETCC now exceeds this arbitrary limit.

Increasing the SearchBound setting gets me back the original machine code for this test case even with my patch.

I'm wondering why this nested loop is even necessary: the main loop of the pass is a forward scan of all MIs, so if you'd simply scan instructions for whether they define eflags as you iterate over them, you'd get the immediately preceding eflags-def for free ...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71841/new/

https://reviews.llvm.org/D71841