[PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))

Thu Jul 15 16:16:57 PDT 2021

lebedev.ri added a comment.

In D106058#2881782 <https://reviews.llvm.org/D106058#2881782>, @tra wrote:

> Is it intentional that two `fdiv arcp` get folded into `fdiv` w/o `arcp`?

Hm, i guess we need to intersect fast-math flags from both original instructions into the new instruction.

> This is part of what made the difference for NVPTX tests.
>
>   SelectionDAG has 21 nodes:
>     t0: ch = EntryToken
>       t14: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_3', undef:i32
>     t15: f32 = extract_vector_elt t14, Constant:i32<0>
>               t4: v1i8,ch = load<(dereferenceable invariant load (s8) from `i1 addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_0', undef:i32
>             t5: i8 = extract_vector_elt t4, Constant:i32<0>
>           t6: i1 = truncate t5
>               t8: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_1', undef:i32
>             t9: f32 = extract_vector_elt t8, Constant:i32<0>
>           t16: f32 = fdiv arcp t9, t15
>               t11: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_2', undef:i32
>             t12: f32 = extract_vector_elt t11, Constant:i32<0>
>           t17: f32 = fdiv arcp t12, t15
>         t18: f32 = select t6, t16, t17
>       t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
>     t20: ch = NVPTXISD::RET_FLAG t19
>   
>   Combining: t20: ch = NVPTXISD::RET_FLAG t19
>   
>   Combining: t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
>   
>   Combining: t18: f32 = select t6, t16, t17
>   Creating new node: t21: f32 = select t6, t9, t12
>   Creating new node: t22: f32 = fdiv t21, t15
>    ... into: t22: f32 = fdiv t21, t15
>
> Without `arcp` we have no choice now but to lower into a regular div instruction.
>
> That said, even if we were to preserve `arcp`, we'd run into the second issue.
> NVPTX itself does not know how to lower `FDIV32rr_prec arcp` correctly and lowers it as a regular `div`.
>
> Previously two divs+select were combined into two multiplications by reciprocal and that was what made it look like we can lower div to multiplication by reciprocal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106058/new/

https://reviews.llvm.org/D106058