[PATCH] D41029: [JumpTables][PowerPC] Let targets decide which switch instructions are suitable for jump tables

Fri Dec 8 14:55:32 PST 2017

nemanjai added a comment.

Eli, thanks so much for your comments. Please keep them coming. I'm not trying to be difficult about any of this - I just want to make sure the final implementation does the right thing both for compile time and run time.
I probably should have noted in the description that this provides significant run time improvements on some important workloads.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:1151
+          case Intrinsic::ppc_is_decremented_ctr_nonzero:
+          case Intrinsic::ppc_mtctr:
+            return true;
----------------
efriedma wrote:
> These intrinsics come out of PPCCTRLoops, right?  Would it make sense to avoid loop transform rather than disabling the jump table transform?
That is correct, that should be the only way to end up with these intrinsics (other than my artifical test case below). We currently won't emit them if the loop contains a switch that might be lowered to a jump table. However, we don't check for whether the loop is within a switch.

I think we want to favour loops that use the CTR because they are quite efficient. There's a single move to the CTR and the condition for the back edge is usually a [generally] well predicted branch if the CTR is non-zero. Furthermore, we only transform loops that either have an unknown trip count or are known to have a somewhat large trip count. So we should rarely be in a situation where we've thwarted a jump table for a loop that will be less efficient as a CTR loop.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:1204
+    }
+    if (mightUseCTR(It.getCaseSuccessor())) {
+      NumSwitchesNotSuitableForJT++;
----------------
efriedma wrote:
> nemanjai wrote:
> > efriedma wrote:
> > > This is worst-case quadratic (N switches times M instructions in the successor); you might need to cache the mightUseCTR computation.
> > Well, this is linear in the sum of instructions in successors of all the cases in the switch. This is since no block is visited twice. If a CTR use is found in a block, we return so definitely won't re-vist it. If no CTR use is found in a block, it is not visited again because we skip it in the `if` block above.
> > 
> > I don't think I can make it any faster than that since I have to check each instruction for using the CTR.
> > 
> > Or did you have something else in mind? Perhaps you mean across multiple invocations of `isSuitableForJumpTable()` on the same `SwitchInst`?
> I mean across multiple invocations of isSuitableForJumpTable for different switches (assuming we don't break the critical edge, which I don't think we do unconditionally).
Ah, now I see what you mean. Sure, computing this once and caching it the way I plan to cache `KnownNoJTSwitches` would certainly be an improvement there. I'll do that if there are no objections.

Repository:
  rL LLVM

https://reviews.llvm.org/D41029