# [llvm-dev] LangRef semantics for shufflevector with undef mask is incorrect

Simon Moll via llvm-dev llvm-dev at lists.llvm.org
Fri Nov 29 04:46:51 PST 2019

On 11/27/19 1:31 PM, Nuno Lopes wrote:
> Quoting Simon Moll via llvm-dev <llvm-dev at lists.llvm.org>:
>
>> On 11/27/19 2:10 AM, Eli Friedman via llvm-dev wrote:
>>
>> The shuffle mask of a shufflevector is special: it's required to be
>> a constant in a specific form.  From LangRef: "The shuffle mask
>> operand is required to be a constant vector with either constant
>> integer or undef values."  So really, we can resolve this any way we
>> want; "undef" in this context doesn't have to mean the same thing as
>> "undef" in other contexts.  Formally, at the LangRef level, we can
>> state that the shuffle mask is not an operand of a shufflevector;
>> instead, it's not a value at all.  It's just a description of the
>> shuffle, defined with a grammar similar to a vector constant.  Then
>> we can talk about shuffle masks where an element is the string
>> "undef", unrelated to the general notion of an undef value.
>>
>> That is something that has been on my mind for a while now. You can
>> ask the same why we use 'undef' for phi nodes. Eg it is legal to
>> turn this:
>>
>>    %x = phi i32 [ 0, A ], [ undef, B ]
>>
>> into
>>
>>    %x = phi i32 [ 0, A ], [ 1, B ]
>>
>> which arguing by the intended semantics of phi nodes should be an
>> illegal transformation but isn't in LLVM.
>>
>> I think that we abuse the 'undef' (symbol) to mute instruction
>> parameters whenever that parameter doesn't matter but we are shy of
>> 'some' value handle to feed the operand slot.
>>
>> IMHO for those cases, we need a proper '\bot' constant that denotes
>> the absence of a concrete value as opposed to 'undef' (conceptually
>> '\top'), which could be any value you'd like it to be.
>  From a correctness perspective, it's fine to use undef in phi nodes.
> But I agree that for e.g. static analysis is not ideal, since as undef
> can take any value, any static analysis you do must return \top for a
> phi node with at least one undef input.
> Switching to a poison value doesn't fix the issue either, since poison
> can still be refined by any value.
>
> This discussion is a bit orthogonal to shufflevector, but consider
> this example:
> %x = phi [\bottom, A], [%v, B]
>
> If we know that %v \in [0, 3], I assume you want to conclude that %x
> \in [2, 3].
> Now assume that %v is only computed in basic block B and not in A.
> When you generate assembly, which value do you use when jumping from
> A? It has to be a value that respects any static analysis you've done
> for %v, so it has to be 2 or 3, and you can't use %v.

The difference is very subtle:

- '\bottom' you may select any value. It is guaranteed that the behavior
of the program is invariant in the value you chose (that's a contract
realizations of '\bottom' the program is broken. There is not 'bottom'
value at runtime. Operations on bottom can be removed (replaced by
'bottom') because they do not happen/cannot have any semantic effect.

- 'undef' you may select any value. The choice of value may change
program behavior (whatever that means, btw..) but *all* resulting
behaviors are ok. 'undef' is also a runtime value. Operations on 'undef'
can yield any result you like because you can pick a value of your
choosing for 'undef' operands and "pretend-execute" the instruction.

Simon

> It's non-trivial. I'm not seeing an easy solution to this precision
> problem. It's an interesting problem nevertheless.
>
> Nuno
>
>
>
>  Click https://www.mailcontrol.com/sr/a2WKaaeZ5RTGX2PQPOmvUiOFufhVdryLUBxLNKNDa_QA7paip68ipHph_rvyKIurfSeJemAnwsUYNBM7q0TQEQ==  to report this email as spam.
>