<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Wed, Oct 19, 2016 at 4:52 AM Nuno Lopes <<a href="mailto:nuno.lopes@ist.utl.pt">nuno.lopes@ist.utl.pt</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> I'm still digesting the proposal in its entirety, but one point I want to call out here:<br class="gmail_msg">

><br class="gmail_msg">

>> On Tue, Oct 18, 2016 at 1:39 PM Friedman, Eli via llvm-dev wrote:<br class="gmail_msg">

>> Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src;<br class="gmail_msg">

>> store i32 dst".  I guess that's illegal under this model?  How about the<br class="gmail_msg">

>> related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4<br class="gmail_msg">

>> x i8> dst"?  (SROA also performs similar transforms.)<br class="gmail_msg">

><br class="gmail_msg">

> I think it is really important that memcpy and load/store pair be equivalent. If this is in fact true about the proposal, I would want to really dig into why and whether anything could be done to make them equivalent.<br class="gmail_msg">

<br class="gmail_msg">

Memcpy does a byte-by-byte copy. So if one of the bits is poison then only the byte containing that bit becomes poison.<br class="gmail_msg">

Therefore, memcpy(x, y, 1) is equivalent to load i8.  But memcpy(x,y,4) is not equivalent to "load i32" since load makes the whole value poison if any of the bits is poison.<br class="gmail_msg"></blockquote><div><br></div><div>That is not the semantic model that LLVM uses today. Your proposal is changing this, and it is an *incredibly* fundamental part of the optimizer.</div><div><br></div><div>LLVM, very fundamentally, models an integer, a vector of bytes, and a vector of bits as absolutely identical to the underlying storage in memory. Changing this would completely break SROA, various parts of mem2reg and reg2mem, and all load widening.</div><div><br></div><div>I understand you have reasons here, but this is a much bigger change than the undef vs. poison part. ;] I'm currently deeply concerned by this change and would strongly prefer looking at alternatives.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

The alternative as given by Eli is to use "load <4 x i8>".  Since now we are loading 4 separate values, poison does not extend past the byte boundary.  When load is lowered, you should get exactly the same code as with "load i32", though.<br class="gmail_msg">

So the hope is that there's no diff at assembly level.<br class="gmail_msg"></blockquote><div><br></div><div>Note that vectors in LLVM are both quite limited and will have likely very different representations in the assembly.</div><div><br></div><div>You're essentially conferring a *type* onto *memory*. That is not currently LLVM's model. Instead, the types in LLVM are tied to *operations*.</div><div><br></div><div>As a consequence, when LLVM sees a vector type, it canonicalizes, optimizes, and code generates to facilitate vector operations on values of vector type. So for example, a vector of 32 i1s is likely to be put into a vector mask register.</div></div></div>