[llvm-dev] LangRef semantics for shufflevector with undef mask is incorrect

Wed Nov 27 04:31:47 PST 2019

Quoting Simon Moll via llvm-dev <llvm-dev at lists.llvm.org>:

> On 11/27/19 2:10 AM, Eli Friedman via llvm-dev wrote:
>
> The shuffle mask of a shufflevector is special: it's required to be  
> a constant in a specific form.  From LangRef: "The shuffle mask  
> operand is required to be a constant vector with either constant  
> integer or undef values."  So really, we can resolve this any way we  
> want; "undef" in this context doesn't have to mean the same thing as  
> "undef" in other contexts.  Formally, at the LangRef level, we can  
> state that the shuffle mask is not an operand of a shufflevector;  
> instead, it's not a value at all.  It's just a description of the  
> shuffle, defined with a grammar similar to a vector constant.  Then  
> we can talk about shuffle masks where an element is the string  
> "undef", unrelated to the general notion of an undef value.
>
> That is something that has been on my mind for a while now. You can  
> ask the same why we use 'undef' for phi nodes. Eg it is legal to  
> turn this:
>
>    %x = phi i32 [ 0, A ], [ undef, B ]
>
> into
>
>    %x = phi i32 [ 0, A ], [ 1, B ]
>
> which arguing by the intended semantics of phi nodes should be an  
> illegal transformation but isn't in LLVM.
>
> I think that we abuse the 'undef' (symbol) to mute instruction  
> parameters whenever that parameter doesn't matter but we are shy of  
> 'some' value handle to feed the operand slot.
>
> IMHO for those cases, we need a proper '\bot' constant that denotes  
> the absence of a concrete value as opposed to 'undef' (conceptually  
> '\top'), which could be any value you'd like it to be.

 From a correctness perspective, it's fine to use undef in phi nodes.  
But I agree that for e.g. static analysis is not ideal, since as undef  
can take any value, any static analysis you do must return \top for a  
phi node with at least one undef input.
Switching to a poison value doesn't fix the issue either, since poison  
can still be refined by any value.

This discussion is a bit orthogonal to shufflevector, but consider  
this example:
%x = phi [\bottom, A], [%v, B]

If we know that %v \in [0, 3], I assume you want to conclude that %x  
\in [2, 3].
Now assume that %v is only computed in basic block B and not in A.  
When you generate assembly, which value do you use when jumping from  
A? It has to be a value that respects any static analysis you've done  
for %v, so it has to be 2 or 3, and you can't use %v.
It's non-trivial. I'm not seeing an easy solution to this precision  
problem. It's an interesting problem nevertheless.

Nuno