[llvm-dev] LangRef semantics for shufflevector with undef mask is incorrect

Simon Moll via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 29 04:46:51 PST 2019

On 11/27/19 1:31 PM, Nuno Lopes wrote:
> Quoting Simon Moll via llvm-dev <llvm-dev at lists.llvm.org>:
>> On 11/27/19 2:10 AM, Eli Friedman via llvm-dev wrote:
>> The shuffle mask of a shufflevector is special: it's required to be  
>> a constant in a specific form.  From LangRef: "The shuffle mask  
>> operand is required to be a constant vector with either constant  
>> integer or undef values."  So really, we can resolve this any way we  
>> want; "undef" in this context doesn't have to mean the same thing as  
>> "undef" in other contexts.  Formally, at the LangRef level, we can  
>> state that the shuffle mask is not an operand of a shufflevector;  
>> instead, it's not a value at all.  It's just a description of the  
>> shuffle, defined with a grammar similar to a vector constant.  Then  
>> we can talk about shuffle masks where an element is the string  
>> "undef", unrelated to the general notion of an undef value.
>> That is something that has been on my mind for a while now. You can  
>> ask the same why we use 'undef' for phi nodes. Eg it is legal to  
>> turn this:
>>    %x = phi i32 [ 0, A ], [ undef, B ]
>> into
>>    %x = phi i32 [ 0, A ], [ 1, B ]
>> which arguing by the intended semantics of phi nodes should be an  
>> illegal transformation but isn't in LLVM.
>> I think that we abuse the 'undef' (symbol) to mute instruction  
>> parameters whenever that parameter doesn't matter but we are shy of  
>> 'some' value handle to feed the operand slot.
>> IMHO for those cases, we need a proper '\bot' constant that denotes  
>> the absence of a concrete value as opposed to 'undef' (conceptually  
>> '\top'), which could be any value you'd like it to be.
>  From a correctness perspective, it's fine to use undef in phi nodes.  
> But I agree that for e.g. static analysis is not ideal, since as undef  
> can take any value, any static analysis you do must return \top for a  
> phi node with at least one undef input.
> Switching to a poison value doesn't fix the issue either, since poison  
> can still be refined by any value.
> This discussion is a bit orthogonal to shufflevector, but consider  
> this example:
> %x = phi [\bottom, A], [%v, B]
> If we know that %v \in [0, 3], I assume you want to conclude that %x  
> \in [2, 3].
> Now assume that %v is only computed in basic block B and not in A.  
> When you generate assembly, which value do you use when jumping from  
> A? It has to be a value that respects any static analysis you've done  
> for %v, so it has to be 2 or 3, and you can't use %v.

The difference is very subtle:

- '\bottom' you may select any value. It is guaranteed that the behavior
of the program is invariant in the value you chose (that's a contract
you sign into). If the program behavior changes due to different
realizations of '\bottom' the program is broken. There is not 'bottom'
value at runtime. Operations on bottom can be removed (replaced by
'bottom') because they do not happen/cannot have any semantic effect.

- 'undef' you may select any value. The choice of value may change
program behavior (whatever that means, btw..) but *all* resulting
behaviors are ok. 'undef' is also a runtime value. Operations on 'undef'
can yield any result you like because you can pick a value of your
choosing for 'undef' operands and "pretend-execute" the instruction.


> It's non-trivial. I'm not seeing an easy solution to this precision  
> problem. It's an interesting problem nevertheless.
> Nuno
>  Click https://www.mailcontrol.com/sr/a2WKaaeZ5RTGX2PQPOmvUiOFufhVdryLUBxLNKNDa_QA7paip68ipHph_rvyKIurfSeJemAnwsUYNBM7q0TQEQ==  to report this email as spam.

More information about the llvm-dev mailing list