[llvm-dev] llvm 10: Why is float experimental_vector_reduce_fmin not tried?
Mark Schimmel via llvm-dev
llvm-dev at lists.llvm.org
Tue Nov 24 13:16:20 PST 2020
LLVM vectorizes this same function for floating point addition just fine (uses experimental_vector_reduce_v2_fadd), but refuses to do the same for minf(). Does anyone have any insight why that would be? I'm using -ffast-math but that doesn't seem to help.
>From grep'ing the sources the best I can figure is that some logic exists for Instruction::FCmp but perhaps not for Intrinsic:: minnum. Is that the case?
; Function Attrs: norecurse nounwind readonly
define float @f(float addrspace(4)* noalias nocapture readonly %a, float addrspace(4)* noalias nocapture readonly %b, float %m) local_unnamed_addr #0 {
entry:
br label %for.body
for.cond.cleanup: ; preds = %for.body
ret float %3
for.body: ; preds = %entry, %for.body
%m.addr.024 = phi float [ %m, %entry ], [ %3, %for.body ] ; [#uses=1 type=float]
%i.023 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; [#uses=3 type=i32]
%arrayidx = getelementptr inbounds float, float addrspace(4)* %a, i32 %i.023 ; [#uses=1 type=float addrspace(4)*]
%0 = load float, float addrspace(4)* %arrayidx, align 4, !tbaa !3 ; [#uses=1 type=float]
%arrayidx1 = getelementptr inbounds float, float addrspace(4)* %b, i32 %i.023 ; [#uses=1 type=float addrspace(4)*]
%1 = load float, float addrspace(4)* %arrayidx1, align 4, !tbaa !3 ; [#uses=1 type=float]
%2 = tail call fast float @llvm.minnum.f32(float %0, float %1) ; [#uses=1 type=float]
%3 = tail call fast float @llvm.minnum.f32(float %m.addr.024, float %2) ; [#uses=2 type=float]
%inc = add nuw nsw i32 %i.023, 1 ; [#uses=2 type=i32]
%cmp = icmp ult i32 %inc, 8192 ; [#uses=1 type=i1]
br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !7
}
LV: Checking a loop in "f" from /path/to/x.c
LV: Loop hints: force=enabled width=0 unroll=0 optspace=0
LV: Found a loop: for.body
LV: Not vectorizing: Found an unidentified PHI %m.addr.024 = phi float [ %m, %entry ], [ %3, %for.body ] ; [#uses=1 type=float]
LV: Interleaving disabled by the pass manager
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201124/af751ca8/attachment.html>
More information about the llvm-dev
mailing list