[PATCH] D105730: [SLP] match logical and/or as reduction candidates

Sat Jul 10 06:00:22 PDT 2021

spatel added inline comments.

================
Comment at: llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll:13
+; CHECK-NEXT:    [[TMP2:%.*]] = bitcast <4 x i1> [[TMP1]] to i4
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i4 [[TMP2]], -1
+; CHECK-NEXT:    br i1 [[TMP3]], label [[COMMON_RET:%.*]], label [[LOR_LHS_FALSE:%.*]]
----------------
RKSimon wrote:
> It doesn't have to be part of this - but should we be trying to fold these patterns to a reduction intrinsic ?
> 
> ```
> ; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt <4 x float> [[T:%.*]], zeroinitializer
> ; CHECK-NEXT:    [[TMP1:%.*]] = freeze <4 x i1> [[TMP0]]
> ; CHECK-NEXT:    [[TMP3:%.*]] = call i1 llvm.vector.reduce.and.v4i1([[TMP1]])
> ```
We are forming a reduction intrinsic in SLP as we can see in the SLP-only tests. 
In this case, we have -O2, so a subsequent InstCombine turns it into bitcast+cmp via:
https://github.com/llvm/llvm-project/blob/d919bca87556548555af0a7aa1239ea64ba4f3e8/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1966

Still need to check what (if any) difference that makes for codegen.

================
Comment at: llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll:448
+; CHECK-NEXT:    [[CMP20:%.*]] = icmp sgt i32 [[TMP0]], 255
+; CHECK-NEXT:    [[OR_COND6:%.*]] = select i1 [[TMP10]], i1 true, i1 [[CMP20]]
+; CHECK-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP3]], [[TMP2]]
----------------
RKSimon wrote:
> any idea why we only match one of the reduction chains?
I haven't stepped through yet. We did make some adjustments for sorting the reduction ops in previous patches, but I doubt that extended to creating multiple reductions and/or re-running analysis after forming a reduction.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105730/new/

https://reviews.llvm.org/D105730