[llvm-dev] SCCP is not always correct in presence of undef (+ proposed fix)

Fri Dec 30 22:01:37 PST 2016

Hi Daniel,

On Fri, Dec 30, 2016 at 9:47 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>>>   Is there a case in your algorithm in which treating an
>>> unknown as an undef will be a problem?
>>>
> Yes, if you try to optimize undefs in any way, no if you move them to
> overdefined :)
>
> IE given phi [a, undef, undef, undef]
>
> with just as many back edges, you will visit this node 4 times.
>
> If you start out with
>
> phi [a, undef, undef, undef], you may think "I know, i will make  these
> undef's a".
>
> So you move everything to value
>
> phi [a, a, a, a]
>
> But remember, you really haven't visited the other 4 edges yet, so you don't
> know if this is a valid value for undef (because of the rules around
> and/or/etc, see http://llvm.org/docs/LangRef.html#undefined-values).

But that's the same as case as:

  %x = phi [a, <unknown>]

Unless I know for sure that <unknown> is a, the *final result* can't
report the phi as a.

However, the I thought the algorithm we're talking about here is
optimistic, and we will initially fold %x to `a`, and then "try to
see" if other edges are viable or not.  You're allowed to incorrectly
optimistic results in the midst of the algorithm run (by design).

Otherwise we won't be able to get cases like:

loop:
  %x = phi [ 0 , entry ], [ 1, loop ]
  if (x != 0) br loop else br leave_loop
leave_loop;

  =>

loop:
  br leave_loop
leave_loop;

As far as I can tell, the <unknown> has the semantic "given the viable
set of edges so far, there is no value assigned to this slot by the
program".  This has the same rewrite rules as undef as far as I can
tell.

However, if you have a case like

loop:
  %x = phi [ 0 , entry ], [ t, loop ]
  t = undef | 1
  if (condition) br loop else br leave_loop
leave_loop;

and you end up with x as phi(0, undef) *after* the algorithm has
terminated, then that is a bug in the undef propagation, since "undef
| 1" is not undef.

> (a, a, a, a of course, is just an example, you also don't know if it's valid
> to always return undef, or 0, or 1, or anything at all, really).
>
> It's not until you hit this node for the 4th time, that you can really
> safely choose a value.
>
> If you have unknown and undef, it goes from
>
> phi [a, unknown, unknown, unknown] to phi [a, undef, unknown, unknown] to
> ...
>
> you know, once you have no unknown values, that everything you need to know
> to resolve it is known.
>
> (this is also why there is no need for a resolver. There really is no safe
> logic in the resolver that can't be pushed into the solver, though it may
> make sense to structure it as a resolver if you think it's easier)
>
> Similarly for or %x, undef.
> You need to know if it's really undef, not "something i haven't visited
> yet".

-- Sanjoy

On Fri, Dec 30, 2016 at 9:53 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> >
>> > You need to distinguish between not visited and visited but undef.
>>
>> What I'm getting at is, if you're implementing something close to the
>> paper mentioned, I can't think of a case where you'd care to
>> distinguish between "not visited" and "visited but undef".  That is,
>> instead of starting from "map each each instruction as not visited" we
>> should be able to (my unproven conjecture) start from "map each
>> instruction to undef", making the "unknown" element superfluous.
>
>
> See the email i just sent.
>
> Your scheme would work if you didn't want to optimize undef, because you'd
> just make meet (anything, undef) -> overdefined.
> It's the "we want to fill in values for undef" that makes it so you need to
> know when you can actually do that.
> (and as mentioned, you don't need a separate post-resolver to do *what we do
> now*, and so far, it has just been a source of bugs :()
>

-- 
Sanjoy Das
http://playingwithpointers.com