[PATCH] D73978: [WIP][FPEnv] Don't transform FSUB(-0.0,X)->FNEG(X) when flushing denormals

Wed Feb 26 08:42:02 PST 2020

spatel added a comment.

In D73978#1890311 <https://reviews.llvm.org/D73978#1890311>, @cameron.mcinally wrote:

> Thanks for that patch, Sanjay.
>
> I have another issue which I hope you can help me sort out. There's a transform in `narrowExtractedVectorBinOp(...)` in DAGCombiner.cpp:
>
> `// extract (binop B0, B1), N --> binop (extract B0, N), (extract B1, N)`
>
> This transform only happens for binops, so we don't see it when SelectionDAGBuilder converts the FSUB->FNEG.
>
> The IR is...
>
>   %rhs_neg = fsub <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, %rhs
>   %splat = shufflevector <4 x float> %rhs_neg, <4 x float> undef, <2 x i32> <i32 3, i32 3>
>   
>
> and after DAGCombine we end up with DAGs like this...
>
>    FNEG:
>    <               t9: v4f32 = bitcast t8
>    <             t24: v4f32 = fneg t9
>    <           t15: v2f32 = extract_subvector t24, Constant:i64<2>
>    <         t17: v2f32 = vector_shuffle<1,1> t15, undef:v2f32
>   
>    FSUB:
>   >               t29: v1i64 = extract_subvector t8, Constant:i64<1>
>   >             t30: v2f32 = bitcast t29
>   >           t32: v2f32 = fneg t30
>   >         t17: v2f32 = vector_shuffle<1,1> t32, undef:v2f32
>   
>
> Moving the extract to the operands (FSUB) is a problem on AArch64 since the extract **could** be rolled into the shuffle (FNEG). E.g.:
>
>    FNEG:
>    <             t9: v4f32 = bitcast t8
>    <           t24: v4f32 = fneg t9
>    <         t26: v2f32 = AArch64ISD::DUPLANE32 t24, Constant:i64<3>
>   
>    FSUB:
>   >                 t29: v1i64 = extract_subvector t8, Constant:i64<1>
>   >               t30: v2f32 = bitcast t29
>   >             t32: v2f32 = fneg t30
>   >           t36: v4f32 = insert_subvector undef:v4f32, t32, Constant:i32<0>
>   >         t37: v2f32 = AArch64ISD::DUPLANE32 t36, Constant:i64<1>
>   
>
> Any insight on the best way to correct this difference? I suppose I could fix up the extract+insert at the MachineInstruction level, but that doesn't seem like the correct fix since other targets could have the same problem.
>
> I'm also a little skeptical about moving the extracts to the operands, and if it's a win in the general case. Seems like it would be stronger after any extract+insert peeps have occurred, but I suppose that's why it's done in DAGCombine. :/

The motivation for narrowExtractedVectorBinOp() was to shrink unnecessarily wide vector ops on x86 (256/512-bit vector code can run much slower than 128-bit vector code).
But we want to avoid moving fneg around too much because it can be folded into some other op for free in many cases. We can show there's an inconsistency in the handling in an independent example, so:
rGb3d0c798367d <https://reviews.llvm.org/rGb3d0c798367d1dbf07d0357260b432571bff5c7d>

Let me know if that works to remove the problem here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73978/new/

https://reviews.llvm.org/D73978