[PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 15 16:16:57 PDT 2021
- Previous message: [PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
- Next message: [PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
lebedev.ri added a comment.
In D106058#2881782 <https://reviews.llvm.org/D106058#2881782>, @tra wrote:
> Is it intentional that two `fdiv arcp` get folded into `fdiv` w/o `arcp`?
Hm, i guess we need to intersect fast-math flags from both original instructions into the new instruction.
> This is part of what made the difference for NVPTX tests.
>
> SelectionDAG has 21 nodes:
> t0: ch = EntryToken
> t14: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_3', undef:i32
> t15: f32 = extract_vector_elt t14, Constant:i32<0>
> t4: v1i8,ch = load<(dereferenceable invariant load (s8) from `i1 addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_0', undef:i32
> t5: i8 = extract_vector_elt t4, Constant:i32<0>
> t6: i1 = truncate t5
> t8: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_1', undef:i32
> t9: f32 = extract_vector_elt t8, Constant:i32<0>
> t16: f32 = fdiv arcp t9, t15
> t11: v1f32,ch = load<(dereferenceable invariant load (s32) from `float addrspace(101)* null`, addrspace 101)> t0, TargetExternalSymbol:i32'repeated_div_recip_a_param_2', undef:i32
> t12: f32 = extract_vector_elt t11, Constant:i32<0>
> t17: f32 = fdiv arcp t12, t15
> t18: f32 = select t6, t16, t17
> t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
> t20: ch = NVPTXISD::RET_FLAG t19
>
> Combining: t20: ch = NVPTXISD::RET_FLAG t19
>
> Combining: t19: ch = NVPTXISD::StoreRetval<(store (s32), align 1)> t0, Constant:i32<0>, t18
>
> Combining: t18: f32 = select t6, t16, t17
> Creating new node: t21: f32 = select t6, t9, t12
> Creating new node: t22: f32 = fdiv t21, t15
> ... into: t22: f32 = fdiv t21, t15
>
> Without `arcp` we have no choice now but to lower into a regular div instruction.
>
> That said, even if we were to preserve `arcp`, we'd run into the second issue.
> NVPTX itself does not know how to lower `FDIV32rr_prec arcp` correctly and lowers it as a regular `div`.
>
> Previously two divs+select were combined into two multiplications by reciprocal and that was what made it look like we can lower div to multiplication by reciprocal.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106058/new/
https://reviews.llvm.org/D106058
- Previous message: [PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
- Next message: [PATCH] D106058: [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the llvm-commits
mailing list