[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)

Mon Jan 2 14:24:39 PST 2017

Hi Davide,

On Sat, Dec 31, 2016 at 4:19 AM, Davide Italiano <davide at freebsd.org> wrote:
> Although originally I wasn't, I'm increasingly convinced this is the
> right path forward (at least in the beginning), i.e. strip undef
> handling entirely. I tried to integrate undef in the solver to see how
> that worked and it seems the proposed lattice has still some issues.

By "integrate undef" do you mean you implemented the lattice you
mentioned in the original post?  If so, can you mention what issues
you ran into?

> We have hard time reasoning about simple things like what's the
> lattice structure and what should be the invariants we
> maintain/enforce at the end of each run (something that's very nice
> about the original algorithm is that you know at the end of the
> algorithm nothing should have value TOP otherwise you forgot to lower
> something/have a bug, but we can't currently assert this because of
> `undef`).

That is a good point.  Can we get the same kind of checking by keeping
track of if we've visited every non-dead instruction in the function
at least once?

> I'm also becoming increasingly convinced that this problem is
> something we (LLVM) created.
> If optimizing `undef` (which is sometimes if not often symptom of
> undefined behaviour/bugs in your program) should come at the expense
> of correctness or complicated logic/potentially unsound algorithms I
> don't think we should go that way.

I agree that undef has soundness issues, but I don't think those are
relevant here.

And with respect to `undef` and its implications on the correctness of
the source program, there are two factors here:

Using `undef` in a side effect (e.g. printf(undef)) is usually a sign
of a buggy source program, especially if the source language is C or
C++.

The *presence* of `undef` is fine, and does not imply that the source
program is incorrect.

> What Danny proposed I think it's very appealing but I'm taking a more
> extreme position here saying that we may consider not doing that in
> the beginning. Today I ran SCCP over a bunch of codebases (completely
> removing the undef optimization) and I found it never actually
> resulting in a runtime performance hit. Of course this is not
> representative at all, but still a datapoint.

What is the contribution of SCCP itself to the (I presume internal)
benchmarks?  If you disable SCCP completely, how much performance do
you lose?

> I think we should take this opportunity as a way to see if we can make
> things simpler/easier to consider correct instead of adding cases for
> problems we discover.

In theory I'm completely fine with not handling undef at all
(i.e. your proposal) as a starting point.  If we see a need for it, we
can always add it back later.

In practice, this is the kind of thing that tends to get added back
poorly (because someone's benchmark will be faster by 1% with a "small
fix" to SCCP).  Given that we're taking a step back and looking at the
bigger picture now anyway, this could be an opportune moment to fix
the underlying issue with regards to undef.

Having said that, since you're doing the work I'm more than happy to
let you make the call on this one.  I have the armchair, but you have
the keyboard. :)

-- Sanjoy

PS:

I still don't have an answer to the very first question I asked:

```
>> Looking at the original bug, it seems like a straightforward
>> undef-propagation bug to me -- SCCP was folding "or undef, constant"
>> to "undef", which is wrong.  Why is changing that not the fix?  That
>> is, some variant of
>
>
> You would still need to fix the iteration order of the resolver, or it will
> make more wrong decisions.
> As Davide discovered, there are bugs open with the same cause.

Davide -- can you point me to those?
```