[llvm-dev] RFC: Killing undef and spreading poison

Tue Nov 8 10:47:32 PST 2016

Hi Chandler,

Thanks for your feedback!

I’m still trying to get to the bottom of your concerns and how to best address them.

As far as I understand, your main concern is that the following should be equivalent to a no-op (thanks Sanjoy for the example!):

i32* ptr = ...

%tmp = load i32, i32* %ptr

store i32 %tmp, i32* %ptr

So introducing a store of the loaded value should be ok, but under our proposal it’s not.  Just to clarify: is this actually your main concern?

I believe many parts of LLVM already interpret poison as a per-value attribute, including when it comes to loads.  Take the following function as an example:

define i16 @g(i8 %in) {

  %a = alloca i32

  %v = add nsw i8 127, %in

  %ptr = bitcast i32* %a to i8*

  %ptr1 = getelementptr i8, i8* %ptr, i32 1

  store i8 %v, i8* %ptr1

  %ptr16 = bitcast i32* %a to i16*

  %load = load i16, i16* %ptr16

  ret i16 %load

}

This program stores 8 bits, and leaves the remaining 24 bits uninitialized. It then loads 16 bits, half initialized to %v, half uninitialized. SROA transforms the above function to:

define i16 @g(i8 %in) {

  %v = add nsw i8 127, %in

  %1 = zext i8 %v to i16

  %2 = shl i16 %1, 8

  %3 = and i16 undef, 255

  %4 = or i16 %3, %2

  ret i16 %4

}

SROA gets rid of the alloca/store/load instructions and replaces those with an OR of undef and shifted %v.

If I call g() with %in=1, then the addition overflows and %v is poison.  Under LLVM semantics, all instructions return poison if given poison as input. Therefore the whole function can be folded to “ret i16 poison”  (if we had a poison value in IR, or otherwise fold to “add nsw i16 INT_MAX, 1”).

So, to justify the correctness of SROA, the load in the original program had to be widening the poison from the stored i8 value and return a i16 poison (i.e., %load had to be poison if %v was poison).  Therefore, we can conclude that in current semantics, loads already widen poison from bit-representation into value-wise representation, which makes the insertion of a load-store round-trip illegal in general.

Please let me know what you think. I may be missing some important detail.

Thanks,

Nuno

From: Chandler Carruth [mailto:chandlerc at google.com] 
Sent: 4 de novembro de 2016 17:35
Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison

On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <nuno.lopes at ist.utl.pt <mailto:nuno.lopes at ist.utl.pt> > wrote:

> I'm still digesting the proposal in its entirety, but one point I want to call out here:
>
>> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:
>> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src;
>> store i32 dst".  I guess that's illegal under this model?  How about the
>> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4
>> x i8> dst"?  (SROA also performs similar transforms.)
>
> I think it is really important that memcpy and load/store pair be equivalent. If this is in fact true about the proposal, I would want to really dig into why and whether anything could be done to make them equivalent.

Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the byte containing that bit becomes poison.
Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is not equivalent to "load i32" since load makes the whole value poison if any of the bits is poison.

That is not the semantic model that LLVM uses today. Your proposal is changing this, and it is an *incredibly* fundamental part of the optimizer.

LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of bits as absolutely identical to the underlying storage in memory. Changing this would completely break SROA, various parts of mem2reg and reg2mem, and all load widening.

I understand you have reasons here, but this is a much bigger change than the undef vs. poison part. ;] I'm currently deeply concerned by this change and would strongly prefer looking at alternatives.

The alternative as given by Eli is to use "load <4 x i8>".  Since now we are loading 4 separate values, poison does not extend past the byte boundary.  When load is lowered, you should get exactly the same code as with "load i32", though.
So the hope is that there's no diff at assembly level.

Note that vectors in LLVM are both quite limited and will have likely very different representations in the assembly.

You're essentially conferring a *type* onto *memory*. That is not currently LLVM's model. Instead, the types in LLVM are tied to *operations*.

As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and code generates to facilitate vector operations on values of vector type. So for example, a vector of 32 i1s is likely to be put into a vector mask register.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161108/ffb4a8a1/attachment.html>