[llvm-dev] RFC: Killing undef and spreading poison

Wed Nov 9 11:09:11 PST 2016

----- Original Message -----
> From: "Nuno Lopes via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Eli' 'Friedman" <efriedma at codeaurora.org>, "Sanjoy Das" <sanjoy at playingwithpointers.com>
> Cc: llvm-dev at lists.llvm.org, "John Regehr" <regehr at cs.utah.edu>
> Sent: Wednesday, November 9, 2016 10:44:19 AM
> Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison
> 
> > On 11/8/2016 3:32 PM, Sanjoy Das wrote:
> >> Hi Nuno, Chandler,
> >>
> >> Nuno Lopes via llvm-dev wrote:
> >> > This program stores 8 bits, and leaves the remaining 24 bits
> >> > uninitialized. It then loads 16 bits, half initialized to %v,
> >> > half
> >> > uninitialized. SROA transforms the above function to:
> >> >
> >> > define i16 @g(i8 %in) {
> >> >    %v = add nsw i8 127, %in
> >> >    %1 = zext i8 %v to i16
> >> >    %2 = shl i16 %1, 8
> >> >    %3 = and i16 undef, 255
> >> >    %4 = or i16 %3, %2
> >> >    ret i16 %4
> >> > }
> >>
> >> This program above returns i16 poison only if "shl i16 poison, 8"
> >> is a
> >> full value poison.  Whether that is the case today or not is
> >> anybody's
> >> guess (as Eli says, we should not rely too much on semantics that
> >> are
> >> known to be broken), so the important question is whether "shl i16
> >> poison, 8" _can_ be defined to be poison only on the higher 8 bits
> >> in
> >> a world with bitwise poison.  If we could make that happen, we'd
> >> also
> >> get some other fringe benefits, like ComputeKnownBits would be
> >> able to
> >> fearlessly return "the low 8 bits of `shl i16 %val, 8` are 0" as
> >> opposed to "the low 8 bits of `shl i16 %val, 8` are 0 or poison".
> >>
> >> One problem with shl always generating non-poison zeroes in the
> >> lower
> >> bits is that it then is not equivalent to multiplication.  This
> >> breaks
> >> optimizations like:
> >>
> >>   %t = ...
> >>   %q = shl i32 %t, 6
> >>
> >> =/=>
> >>
> >>   %t = ...
> >>   %q = mul i32 %t, 64
> >>
> >> and (probably more meaningful)
> >>
> >>   %t = ...
> >>   %q0 = shl i32 %t, 6
> >>   %q1 = mul i32 %q0, 3
> >>
> >> =/=>
> >>
> >>   %t = ...
> >>   ;; %q0 = shl i32 %t, 6
> >>   %q1 = mul i32 %q0, (3 * 64)
> >>
> >>
> >> and (probably even worse) it also breaks reassociation:
> >>
> >>   %t = ...
> >>   %q0 = shl i32 %t, %P
> >>   %q1 = mul i32 %q0, %Q
> >>
> >> (No other uses of %q0, but there are uses of %q1)
> >>
> >> =/=>
> >>
> >>   %t = ...
> >>   %q0 = mul i32 %t, %Q
> >>   %q1 = shl i32 %q1, %P
> >>
> >>
> >> I'm especially worried about this last re-association example
> >> since
> >> SCEV does things like this internally when canonicalizing
> >> expressions,
> >> and I'm not convinced that we can change it to tiptoe around these
> >> problematic cases.
> >
> > Well, we could say non-nsw add and mul are actually "bitwise"
> > operations, so "mul i32 poison, 2" is guaranteed to have its
> > bottom bit zero (but "mul nsw i32 poison, 2" is poison).  Of
> > course, there's a limit to how far we can go in that direction, or
> > we just end up with the old notion of undef.  Off the top of my
> > head, I'm not exactly sure where that line is.
> 
> Right, we could try to make it work.  I don't have a good model for
> bit-wise poison yet, but it might be possible.  One of the things it
> will break is rewrites into comparisons, since at least comparisons
> will need to return poison if any bit of the inputs is poison. So
> the direction of "arithmetic -> comparison" gets broken (InstCombine
> has a few of these).
> 
> But before we try to make bit-wise poison work, I would like to
> understand what are the deficiencies of value-wise poison. Is there
> any major issue?  I'm asking because value-wise poison seems to be a
> much simpler concept, so if it works ok it seems to be preferable.
> 
> During the meeting last week, Richard Smith proposed that instead we
> add a "load not_poison %ptr" (like load atomic) instruction that
> would be equivalent to load+freeze+bitcast, because he is concerned
> that some C++ stuff needs to be lowered to such a load. This load
> then becomes tricky to move around (e.g., cannot be sank to inside
> loops), but there are options to attenuate that problem if
> necessary.

Wouldn't we need to do this for all loads? Does this solve the load/store <-> memcpy equivalence problem?

Thanks again,
Hal

> Perhaps this construct would make John McCall happy as well (re. his
> question during the talk last week)?
> 
> Thanks,
> Nuno
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory