[llvm-dev] RFC: Killing undef and spreading poison

Tue Oct 18 06:29:09 PDT 2016

----- Original Message -----
> From: "Nuno Lopes" <nuno.lopes at ist.utl.pt>
> To: llvm-dev at lists.llvm.org
> Cc: "Sanjoy Das" <sanjoy at playingwithpointers.com>, "David Majnemer" <david.majnemer at gmail.com>, "John Regehr"
> <regehr at cs.utah.edu>, "Chung-Kil Hur" <gil.hur at sf.snu.ac.kr>, "Juneyoung Lee" <juneyoung.lee at sf.snu.ac.kr>,
> "Yoonseung Kim" <yoonseung.kim at sf.snu.ac.kr>, "Youngju Song" <youngju.song at sf.snu.ac.kr>, hfinkel at anl.gov, "Dan
> Gohman" <dan433584 at gmail.com>, nlewycky at google.com, mbraun at apple.com, qcolombet at apple.com, "t p northover"
> <t.p.northover at gmail.com>
> Sent: Tuesday, October 18, 2016 7:06:31 AM
> Subject: RFC: Killing undef and spreading poison
> 
> Hi,
> 
> Over the past few years we've been trying to kill poison somehow.
> There have
> been a few proposals, but they've all failed to pass the bar and/or
> to
> gather significant support (my own proposals included).
> We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and
> myself)
> have a new proposal to kill undef instead and replace it with poison
> + a new
> 'freeze' instruction.  We believe this proposal simplifies things at
> the IR
> level, allows us to fix long-standing bugs in LLVM, and is roughly
> performance-neutral for now (and can enable further optimizations in
> the
> future).
> 
> Sorry for the longish email.  We will give a talk about this proposal
> at the
> LLVM dev meeting in November, so hopefully we'll be able to discuss
> this
> further in person.
> 
> We would appreciate any comments, feedback, suggestions, etc.  If you
> have
> questions let me know as well (not everything below is super
> detailed, so
> please ask where things are not explicit enough).
> (I've CC'ed a few people that have been involved in discussions re
> semantics
> of LLVM IR in the past year or so.  Apologies in advance if I forgot
> someone
> -- which probably I did.  I've also CC'ed some CodeGen folks, from
> which we
> would appreciate input on how this proposal fits within the current
> and the
> future pipeline)
> 

Thanks for working on this! A couple of questions/comments below:

...

> 
> Purpose of Freeze
> =================
> Poison is propagated aggressively throughout. However, there are
> cases where
> this breaks certain optimizations, and therefore freeze is used to
> selectively stop poison from being propagated.
> 
> A use of freeze is to enable speculative execution.  For example,
> loop
> switching does the following transformation:
> while (C) {
>   if (C2) {
>    A
>   } else {
>    B
>   }
> }
> =>
> if (C2) {
>    while (C)
>       A
> } else {
>     while (C)
>        B
> }
> 
> Here we are evaluating C2 before C.  If the original loop never
> executed
> then we had never evaluated C2, while now we do.  So we need to make
> sure
> there's no UB for branching on C2.  Freeze ensures that so we would
> actually
> have 'if (freeze(C2))' instead.
> Note that having branch on poison not trigger UB has its own
> problems.

Can you please elaborate on these problems?

>  We
> believe this is a good tradeoff.
> 
> 

...

> 
> Discussion on select
> ====================
> Select is a tricky instruction to define semantics for.  There are
> multiple
> views of select's purpose that are contradictory:
>  1) Select should be equivalent to arithmetic (e.g., allow  'select
>  %c,
> true, %x' -> 'or %c, %x' and arithmetic -> select)
>  2) br + phi -> select should be allowed (simplifyCFG)
>  3) Select -> br + phi should be allowed
> 
> To have 1), we need to make select poison if either of its arguments
> is
> poison. This disallows 2), since there we need to have select being
> poison
> only if the dynamically-chosen value is poison.
> 2) and 3) are orthogonal and do not conflict, though.
> 
> 3) is hard because it requires select to be stronger than branching
> (UB-wise), meaning that we would need select to be UB if the
> condition was
> poison. This blocks a bunch of instcombine patterns, like:
> Pre: C < 0
> %r = udiv %a, C
>   =>
> %c = icmp ult %a, C
> %r = select %c, 0, 1
> 
> If %a is poison, then we would be introducing UB. Adding freeze(%c)
> would
> fix the problem, though.
> 
> 
> Since we really want to have 2) since that's simplifyCFG, we propose
> to make
> 'select %c, %a, %b' poison if any of the following holds:
>  - %c is poison
>  - %c = true and %a is poison
>  - %c = false and %b is poison
> 
> This disallows 3) and some transformations of 1). Since 3) is only
> performed
> in CodeGenPrepare, we can safely ignore it (no aggressive
> optimizations are
> run afterwards).

This strikes me as a dangerous game to play. CGP's transformations need to be semantically sound (even if they are anti-canonical). In part, this is because our IR-level analyses are used by CGP. Moreover, targets after CGP run IR-level transformations, and having different semantic rules there is going to make code reuse hard in subtle ways. Which brings up another issue: We currently have undef at the MI level; do you propose changing that, and if so, how?

 -Hal

> For 1), we will have to restrict InstCombine
> patterns for
> the cases when we are sure a given variable is non-poisonous (which
> is only
> when it came from a 'freeze' instruction or it's the result of a
> non-poison-producing instruction).
> This semantics also allows arithmetic -> select, but not select ->
> arithmetic (without use of freeze).
> 

...

> 
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory