[llvm-dev] Is this undefined behavior optimization legal?

Friedman, Eli via llvm-dev llvm-dev at lists.llvm.org
Mon Oct 3 16:10:29 PDT 2016


On 10/3/2016 1:51 PM, Tom Stellard via llvm-dev wrote:
> Hi,
>
> I've found a test case where SelectionDAG is doing an undefined behavior
> optimization, and I need help determining whether or not this is legal.
>
> Here is the example IR:
>
> define void @test(<4 x i8> addrspace(1)* %out, float %a) {
>    %uint8 = fptoui float %a to i8
>    %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0
>    store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
>    ret void
> }
>
> Since %vec is a 32-bit vector, a common way to implement this function on a target
> with 32-bit registers would be to zero initialize a 32-bit register to hold
> the initial vector and then 'mask' and 'or' the inserted value with the
> initial vector.  In AMDGPU assembly it would look something like:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_and_b32 v1, v1, 0x000000ff
> v_or_b32 v0, v0, v1
>
> The optimization the SelectionDAG does for us in this function, though, ends
> up removing the mask operation.  Which gives us:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_or_b32 v0, v0, v1
>
> The reason the SelectionDAG is doing this is because it knows that the result
> of %uint8 = fptoui float %a to i8 is undefined when the result uses more than
> 8-bits.  So, it assumes that the result will only set the low 8-bits, because
> anything else would be undefined behavior and the program would be broken.
> This assumption is what causes it to remove the 'and' operation.
>
> So effectively, what has happened here, is that by inserting the result of
> an operation with undefined behavior into one lane of a vector, we have
> overwritten all the other lanes of the vector.
>
> Is this optimization legal?  To me it seems wrong that undefined behavior
> in one lane of a vector could affect another lane.  However, given that LLVM IR
> is SSA and we are technically creating a new vector and not modifying the old
> one, then maybe it's OK.  I'm just not sure.
>
> Appreciate any insight people may have.

The way insertelement is defined, inserting an element never affects the 
other elements of the vector ("Its element values are those 
of|val|...")  So the question is whether you're triggering undefined 
behavior in some other way. Looking at LangRef for fptoui, it says "If 
the value cannot fit in|ty2|, the results are undefined", i.e. the value 
is equivalent to the constant "undef".  Therefore, you should end up 
storing "<4 x i8> <undef, 0, 0, 0>", not "<4 x i8> undef".

Note that there's a tradeoff here: saying that fptoui for out-of-range 
values doesn't have undefined behavior allows us to simplify control 
flow and hoist operations more aggressively.

-Eli

-- 

Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161003/ba4df182/attachment.html>


More information about the llvm-dev mailing list