[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)
    Sanjoy Das via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Fri Dec 30 21:03:57 PST 2016
    
    
  
Hi David,
Looking at the original bug, it seems like a straightforward
undef-propagation bug to me -- SCCP was folding "or undef, constant"
to "undef", which is wrong.  Why is changing that not the fix?  That
is, some variant of
diff --git a/lib/Transforms/Scalar/SCCP.cpp b/lib/Transforms/Scalar/SCCP.cpp
index 8a6be97..45f1241 100644
--- a/lib/Transforms/Scalar/SCCP.cpp
+++ b/lib/Transforms/Scalar/SCCP.cpp
@@ -908,8 +908,18 @@ void SCCPSolver::visitBinaryOperator(Instruction &I) {
   }
   // If something is undef, wait for it to resolve.
-  if (!V1State.isOverdefined() && !V2State.isOverdefined())
-    return;
+  if (!V1State.isOverdefined() && !V2State.isOverdefined()) {
+    Constant *L = UndefValue::get(I.getOperand(0)->getType());
+    if (V1State.isConstant())
+      L = V1State.getConstant();
+    Constant *R = UndefValue::get(I.getOperand(0)->getType());
+    if (V2State.isConstant())
+      R = V2State.getConstant();
+    Constant *C = ConstantExpr::get(I.getOpcode(), L, R);
+    if (isa<UndefValue>(C))
+      return;
+    return markConstant(IV, &I, C);
+  }
   // Otherwise, one of our operands is overdefined.  Try to produce something
   // better than overdefined with some tricks.
Also, did you mean to make the lattice as:
digraph G {
  Unknown -> Undef
  Undef -> Constant1
  Undef -> Constant2
  Undef -> Constant3
  Constant1 -> Bottom
  Constant2 -> Bottom
  Constant3-> Bottom
}
?  In the lattice you've drawn, Constant MEET Undef will be Bottom,
when it should ideally be Constant.
Secondly, what's the purpose of splitting Unknown and Undef in the new
scheme?  Is there a case in your algorithm in which treating an
unknown as an undef will be a problem?
Actually given the above lattice (assuming you're okay with that) we know
treating unknown as undef should not make us any less conservative
(i.e. should be safe) since undef is lower than unknown.  IOW why not change
the algorithm to start off each cell as undef and not unknown?
-- Sanjoy
On Fri, Dec 30, 2016 at 9:27 AM, Davide Italiano via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi.
> I'm sending this email to -dev as this may be of interest of
> many/people may have opinions/want to try the change before it goes in
> to report problems.
> I've been recently working on a patch to integrate `undef` in the SCCP
> solver, in the hope of fixing a tail of latent bugs in SCCP which
> remained uncovered for many years. I think this is a decent time to
> propose, so that it can (hopefully) be integrated in the 4.0 branch
> (or backported, FWIW), if it gets reviewed and there are no
> substantial problems.
>
> #### Background:
> The current lattice for SCCP doesn't know about `undef`, that is, it's
> identical to the one described in [1]. There are three lattice states,
> "Unknown"/"Constant"/"Overdefined" (the first one is what the paper
> calls "Undefined", but was renamed as "undefined" in LLVM has a
> different meaning).
>
> SCCP (or its interprocedural counterpart) currently consists of two phases:
> 1) The solver, which implements the algorithm described in the paper.
> At the end of this phase, if there are no `undef` values in the input,
> all the values supposedly have been lowered (in a lattice-theoretical
> sense) to either constant or overdefined. In presence of `undef`,
> there may be values which are still in the 'unknown' state.
> 2) The resolver assumes that everything in the 'unknown' state is
> undefined, and runs a procedure trying to plug the "correct" value,
> and in case something changed, runs the solver again.
>
> #### Problem/Motivation for the change:
> 1) and 2) use different orders of visitation. 1) walks the use-def
> chains, 2) visit the function in basic block order (starting from the
> entry block of the function).
> The fact that 2) walks over the function instead of the SSA graph
> creates some issues where it cannot distinguish between the following
> two cases:
>
> a) It doesn't know how to resolve this value
> b) This value is really undef.
>
> and may plug in wrong/invalid values. For a real example, see the bug
> in [3], for a proof of why it's wrong, see [2] (courtesy of Daniel
> Berlin).
>
> #### Proposed fix:
> Integrate `undef` in the solver. Undef is a lattice value lower than
> `Unknown` but higher than `Overdefined` (at the same lattice level of
> constant). Graphically:
>
>    unknown
>     /     \      \
>    /       \      \
>  c1 ...  ck undefined
>   \         |       /
>    \        |      /
>    overdefined
>
>
> #### Benefits:
> -> The code now is correct (modulo other bugs =)) in presence of undefs
> -> This allows us to keep the same lattice height and therefore all
> the nice termination/convergence properties/bounds (see the paper for
> reference).
> -> This allows to remove a lot of code (pretty much the resolver) and
> makes the algorithm easier to understand
> -> SCCP doesn't need anymore to roll its own tiny constant folder
> where we need to add cases everytime we realize there's something
> missing, instead, it can leverage the full power of ConstantFolding
> when something folds to undef.
> -> Makes SCCP *slightly* faster.
>
> #### (Potential) drawbacks:
> -> This change the behavior of `phi` handling, i.e. we lose some cases
> as we force phi to overdefined if not the values are constant. I
> measured the impact on the llvm-testsuite and on some internal
> benchmarks and I wasn't able to measure any substantial different in
> runtime performances. If you care about this case, please speak up =)
> (or try the patch and report regressions). There are probably ways to
> make this working but we (see [4], comment 7 and 8) think the gain in
> precision is not worth the complexity (and what's here is *much much*
> better than what's in the tree as it tries at least to be correct).
>
> #### Plan for integration
> The patch is available at https://reviews.llvm.org/D28177
> Any comments/feedback/testing is definitely welcome. This is a
> self-contained change.
>
> #### (Possible) future work
> If this goes in I'd like to implement constant propagation for missing
> IR constructs (in particular vectors), and make sure SCCP is on par
> with the other two costant propagation passes we have in llvm (and
> maybe garbage collect them).
> Danny suggested an improvement could be that of propagating facts in
> both directions (GCC does that as a separate pass
> https://gcc.gnu.org/svn/gcc/trunk/gcc/gimple-ssa-backprop.c). I have
> no immediate plans to work on this (I suspect NewGVN and other places
> will keep me busy for a while), but I hope to eventually get there.
>
> #### Thanks
> Eli Friedman and Daniel Berlin provided very insightful
> conversations/suggestions on how to tackle the problems, and provided
> early feedback on the change (other than reviewing large part if not
> all my previous work on SCCP).
>
>
> [1] https://www.cs.utexas.edu/users/lin/cs380c/wegman.pdf
> [2] http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161107/403212.html
> [3] https://llvm.org/bugs/show_bug.cgi?id=30448
> [4] https://llvm.org/bugs/show_bug.cgi?id=30966#c8
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Sanjoy Das
http://playingwithpointers.com
    
    
More information about the llvm-dev
mailing list