[PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))

Thu Jul 15 16:14:18 PDT 2021

tra added a comment.

Is it intentional that two `fdiv arcp` get folded into `fdiv` w/o `arcp`?

This is part of what made the difference for NVPTX tests.

  SelectionDAG has 21 nodes:
    t0: ch = EntryToken
      t14: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_3', undef:i32
    t15: f32 = extract_vector_elt t14, Constant:i32<0>
              t4: v1i8,ch = load<(dereferenceable invariant load (s8) from `i1 addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_0', undef:i32
            t5: i8 = extract_vector_elt t4, Constant:i32<0>
          t6: i1 = truncate t5
              t8: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_1', undef:i32
            t9: f32 = extract_vector_elt t8, Constant:i32<0>
          t16: f32 = fdiv arcp t9, t15
              t11: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_2', undef:i32
            t12: f32 = extract_vector_elt t11, Constant:i32<0>
          t17: f32 = fdiv arcp t12, t15
        t18: f32 = select t6, t16, t17
      t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
    t20: ch = NVPTXISD::RET_FLAG t19

  Combining: t20: ch = NVPTXISD::RET_FLAG t19

  Combining: t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18

  Combining: t18: f32 = select t6, t16, t17
  Creating new node: t21: f32 = select t6, t9, t12
  Creating new node: t22: f32 = fdiv t21, t15
   ... into: t22: f32 = fdiv t21, t15

Without `arcp` we have no choice now but to lower into a regular div instruction.

That said, even if we were to preserve `arcp`, we'd run into the second issue.
NVPTX itself does not know how to lower `FDIV32rr_prec arcp` correctly and lowers it as a regular `div`.

Previously two divs+select were combined into two multiplications by reciprocal and that was what made it look like we can lower div to multiplication by reciprocal.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106058/new/

https://reviews.llvm.org/D106058