[PATCH] D74185: Revert the revert of vectorization commits

Mon Feb 10 07:39:11 PST 2020

ABataev added a comment.

In D74185#1865072 <https://reviews.llvm.org/D74185#1865072>, @george.karpenkov wrote:

> I did manage to reduce the test case further.
>
> For input IR: https://gist.github.com/cheshire/17067d5ba4781817861c8b21d15c928d
>
> Bad optimized version: https://gist.github.com/cheshire/bf1047b4385bcf82c22a70f5cf1fb5df
>
> Good optimized version: https://gist.github.com/cheshire/8bea1f36ab849f8945bc190b519272a6
>
> The compilation comes from XLA <https://www.tensorflow.org/xla/operation_semantics> test case which looks like this:
>
>   HloModule EntryModule
>  
>   ENTRY EntryModule {
>     %input0 = f64[] parameter(0)
>     %sign_227 = f64[] sign(f64[] %input0)
>     %multiply_235 = f64[] multiply(f64[] %sign_227, f64[] %sign_227)
>  
>     %p4 = f64[2,2] broadcast(%input0), dimensions={}
>     %dot_81 = f64[2,2] dot(f64[2,2] %p4, f64[2,2] %p4), lhs_contracting_dims={1}, rhs_contracting_dims={1}
>     %br2 = f64[2,2] broadcast(f64[] %multiply_235), dimensions={}
>     %reshape_294 = f64[2,2] multiply(f64[2,2] %dot_81, f64[2,2] br2)
>  
>     %broadcast_298 = f64[2,3,2,3] broadcast(f64[2,2] %reshape_294), dimensions={0,2}
>  
>     %arg7_8 = f64[3,3] parameter(1)
>     %broadcast_300 = f64[2,3,2,3] broadcast(f64[3,3] %arg7_8), dimensions={1,3}
>     %multiply_301 = f64[2,3,2,3] multiply(f64[2,3,2,3] %broadcast_298, f64[2,3,2,3] %broadcast_300)
>  
>     %reshape_302 = f64[6,6] reshape(f64[2,3,2,3] %multiply_301)
>     %zero = f64[] constant(0)
>     %zeros = f64[6,6] broadcast(f64[] %zero), dimensions={}
>  
>     %diag = pred[6,6] constant({{1,0,0,0,0,0}, {0,1,0,0,0,0}, {0,0,1,0,0,0}, {0,0,0,1,0,0}, {0,0,0,0,1,0}, {0,0,0,0,0,1}})
>     ROOT %select_316 = f64[6,6] select(pred[6,6] %diag, f64[6,6] %reshape_302, f64[6,6] %zeros)
>   }
>
>
> Essentially, it performs some element-wise multiplications on random input floats, and then replaces all non-diagonal entries with zeros.
>
> Difference in output:
>
>   Expected literal:
>   f64[6,6] {
>     { 0.000275566374291495, 0, 0, 0, 0, 0 },
>     { 0, 0.00040918878254846619, 0, 0, 0, 0 },
>     { 0, 0, -0.00058600272184509581, 0, 0, 0 },
>     { 0, 0, 0, 0.000275566374291495, 0, 0 },
>     { 0, 0, 0, 0, 0.00040918878254846619, 0 },
>     { 0, 0, 0, 0, 0, -0.00058600272184509581 }
>   }
>  
>   Actual literal:
>   f64[6,6] {
>     { 0.000275566374291495, 0, 0, 0, 0, 0 },
>     { 0, 1, 0, 0, 0, 0 },
>     { 0, 0, -0.00058600272184509581, 0, 0, 0 },
>     { 0, 0, 0, 0.000275566374291495, 0, 0 },
>     { 0, 0, 0, 0, 0.00040918878254846619, 0 },
>     { 0, 0, 0, 0, 0, -0.00058600272184509581 }
>   }
>   1/1 runs miscompared.
>  
>
>
> This is not something which can be caused by fast math: a number created by elementwise multiplication of random input floats is exactly "1" in the bad version.

Hmm, tried this reproducer with trunk and it generates the same LLVM IR just like your correct variant. Could you try to check it using the trunk, maybe the bug was fixed already? The convert of `fcmp` to double also looks suspicious to me, but it looks like the instruction combiner bug, not vectorizer. Vectorizer does not do this kind of transformation with IR. Meanwhile, I'll try to investigate you iriginal reproducer, maybe it is also fixed already.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74185/new/

https://reviews.llvm.org/D74185