[PATCH] D28147: [LICM] Allow promotion of some stores that are not guaranteed to execute

Thu Dec 29 12:36:35 PST 2016

reames requested changes to this revision.
reames added a comment.
This revision now requires changes to proceed.

Ok, with the inline discussion, it looks like this is a valid and useful extension.  Can you update the description of the patch to reflect the discussion?

Inline code comments below.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:253
   // memory references to scalars that we can.
-  if (!DisablePromotion && (Preheader || L->hasDedicatedExits())) {
+  // Don't sink stores from loops without dedicated block exits. Exits
+  // containing indirect branches are not transformed by loop simplify,
----------------
You've got a scattering of unrelated trivial cleanup changes here (style fixes, clarifying comments on the original code, etc..).  Please separate those and submit them.  No further review needed, but I'd like to get them out of this patch for future rounds.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:258
+  // is available.
+  if (!DisablePromotion && Preheader && L->hasDedicatedExits()) {
+    // Figure out the loop exits and their insertion points
----------------
Changing from the or and exit scheme to the and scheme should be NFC.  Please separate and land without further review.  (In particular, if this turns out *not* to be NFC, I want to be able to bisect to the refactoring.)

================
Comment at: lib/Transforms/Scalar/LICM.cpp:268
+
+    if (!HasCatchSwitch) {
+      SmallVector<Instruction *, 8> InsertPts;
----------------
You switching from a lazy to an eager computation of both exit blocks and insert pts.  In this case, it shouldn't matter since hasDedicatedExits also does the same (relatively expensive) computation of the loop exits.

At some point, we really do need to cache the loop exit walk.... (not for this change)

================
Comment at: lib/Transforms/Scalar/LICM.cpp:288
+  // loop used in the outer loop.
+  // FIXME: This is really heavy handed. It would be a bit better to use an
+  // SSAUpdater strategy during promotion that was LCSSA aware and reformed
----------------
Use of the Changed variable here is overly conservative.  We actually only need to run this if *store promotion* changed the loop.  If you wanted to do that as a a separate change, it would be a useful cleanup and compile time win.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:955
   bool Changed = false;
   for (const auto &ASI : AS) {
     Value *ASIV = ASI.getValue();
----------------
This loop is getting really complicated and hard to follow.  I think we need to find an alternate structure here which makes your changes easier to reason about.  

One idea would be to introduce two variables for the two different things we're trying to establish.  (i.e. LocationIsDereferencable, and CanInsertStoresOnExits.)  The key simplification is that we don't need to track *why* we know we can insert the stores.

I'm also open to other organizations here.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:970

-        // Note that we only check GuaranteedToExecute inside the store case
-        // so that we do not introduce stores where they did not exist before
----------------
You still need a slightly modified version of this comment.

================
Comment at: lib/Transforms/Scalar/LICM.cpp:1034
   // Check legality per comment above. Otherwise, we can't promote.
-  bool PromotionIsLegal = GuaranteedToExecute;
   if (!PromotionIsLegal && CanSpeculateLoad) {
+    PromotionIsLegal = StoreDominatesExits;
----------------
I'm having a hard time following the semantic changes here intermixed with the refactoring.  Please separate that and commit without further review.  I suspect that once that's done, I'll spot a couple more things here.

================
Comment at: test/Transforms/LICM/scalar_promote.ll:208
+
+loop:
+  %j = phi i32 [ 0, %entry ], [ %next, %loop ]
----------------
Looking at this test, it really should be handled by the thread local code we already have.  Have you looked to see why it isn't?

In fact, looking at *all* of your tests, the thread local case should handle it...?

(Note: This is a very important question to that I'll want an answer to before we commit the semantics parts of this code.)

https://reviews.llvm.org/D28147