[PATCH] D41029: [JumpTables][PowerPC] Let targets decide which switch instructions are suitable for jump tables

Wed Dec 13 08:02:49 PST 2017

hfinkel added inline comments.

================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:1162
+          case Intrinsic::ppc_mtctr:
+            BlocksKnownToUseCTR.insert(BB);
+            return true;
----------------
nemanjai wrote:
> hfinkel wrote:
> > I feel like any time this would be true in a block terminated by a switch is a bug, and not using a jump table at all just because some case happens to jump to a loop pre-header seems very severe. Maybe we just want to raise the number of entries in the table in that case?
> I certainly agree with your first comment. We should never be in a situation where these intrinsics are used in a loop terminated by a switch (as I've mentioned in my response to Eli above). However, I didn't find it pertinent to add a special case here to only check this if we're looking at a case successor. How about I just add an assert to that end and remove the test case that artificially adds this?
> 
> As far as the second comment, I think that the sequence of a case that jumps to a loop pre-header is certainly something we want to avoid if the loop trip count is too small to hide the latency of `mtctr -> bctr -> mtctr`. However, I think that requires more analysis than we want to do here. Perhaps the right approach would be to not emit a jump table if for example more than N% of the cases jump to a loop pre-header.
> However, if it's OK with you, I'd prefer to keep the logic simple on this initial patch. So I'd like to opt for either the current behaviour or just removing the intrinsic checks altogether (perhaps with a FIXME in a comment). What do you think?
> I certainly agree with your first comment. We should never be in a situation where these intrinsics are used in a loop terminated by a switch (as I've mentioned in my response to Eli above). However, I didn't find it pertinent to add a special case here to only check this if we're looking at a case successor. How about I just add an assert to that end and remove the test case that artificially adds this?

That's fine.

> ... a FIXME in a comment). What do you think?

Hrmm. The issue is that I think we always want to do this (i.e., for all checks here). Moreover, I think that not doing this could cause a significant regression for someone. Is doing this just a matter of also comparing against NumCases in isSuitableForJumpTable against some threshold?

I'm assuming that, based on your logic above, and if I assume that the `ctctr -> bctr -> mtctr` has an apx. 5 cycle penalty, then during those 5 cycles I could have done around 5 branches. If I assume, not a comparison to a linear sequence of checks, but to a binary search tree (which I believe we generate), then the break-even point would be around 2^5 = 32 cases.  Thus, a threshold of ~64 cases would make sense. Does this reasoning make sense to you?

================
Comment at: lib/Target/PowerPC/PPCISelLowering.h:539
+    mutable std::set<const BasicBlock*> BlocksKnownToUseCTR;
+    mutable std::set<const BasicBlock*> BlocksKnownNotToUseCTR;

----------------
nemanjai wrote:
> hfinkel wrote:
> > efriedma wrote:
> > > We generally prefer SmallPtrSet for sets of pointers.  And for long-lived pointers to LLVM values, it's generally a good idea to use AssertingVH (so something like `SmallPtrSet<AssertingVH<SwitchInst>>`).
> > > 
> > > We share the same PPCTargetLowering for multiple functions, so you'll build up a set for the entire module in most cases.  But I guess that isn't really a problem.
> > Please use a SmallPtrSet here. In LLVM, we only use std::set if we need some property it offers (e.g, ordered iteration or stable addressing). Otherwise, we don't. That actually makes the semantics clearer, but only if we're consistent.
> Yes. I'll certainly change it to `SmallPtrSet`. Also, thanks for pointing out `AssertingVH` Eli. I was not aware of its existence.
I'm not actually sure that an AssertingVH will work here if it ends up used via TTI (where the IR is not invariant). You may need to use a value handle with a callback that removes itself from the sets upon delete (e.g., the way that the AssumptionCache / ValueMap works).

================
Comment at: lib/Target/PowerPC/PPCISelLowering.h:570-572
+    unsigned getMinimumJumpTableEntries() const override {
+      return 7;
+    }
----------------
nemanjai wrote:
> echristo wrote:
> > This seems randomly chosen. How'd we pick 7? Can we describe it in a comment?
> How about if I were to add something like:
> ```
> // The lwax -> mtctr sequence takes 11 cycles whereas each
> // cmpli takes 2 cycles and can dispatch 4-wide. Assuming the
> // worst case - last branch is taken, we can (theoretically) complete
> // 6 cmpli's in 3 cycles and then the 6 branches in 8 cycles with a
> // single cycle overlap for a total of 10 cycles.
> ```
> Although I find that a little too detailed.
I don't. This is good. Please also say that this number was estimated for the `P8` (or `P9`, as the case may be).

Repository:
  rL LLVM

https://reviews.llvm.org/D41029