[llvm-dev] Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Tue Sep 25 00:23:30 PDT 2018

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {
  for (int i = 0; i < len; i++)
    if (C[i] > VSMALL)
      A[i] = B[i] / C[i];
}

In this test the div operation is conditional but llvm is generating unconditional div for this case:

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %0 = getelementptr inbounds float, float* %C, i64 %index
  %1 = bitcast float* %0 to <8 x float>*
  %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6
  %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30
  %3 = getelementptr inbounds float, float* %B, i64 %index
  %4 = bitcast float* %3 to <8 x float>*
  %wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9
  %5 = fdiv <8 x float> %wide.masked.load, %wide.load
  %6 = getelementptr inbounds float, float* %A, i64 %index
  %7 = bitcast float* %6 to <8 x float>*
  call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13
  %index.next = add i64 %index, 8
  %8 = icmp eq i64 %index.next, %n.vec
  br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane.

This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem.

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, unsigned VF) {
  if (!Legal->blockNeedsPredication(I->getParent()))
    return false;
  switch(I->getOpcode()) {
  default:
    break;
  case Instruction::UDiv:  <- Floating point operations not considered i.e FDiv & FRem
  case Instruction::SDiv:
  case Instruction::SRem:
  case Instruction::URem:
    return mayDivideByZero(*I);
}

I don't have any background of this function, but I feel this should consider FDiv & FRem instructions as well.

If there is no objection to it, will do a patch.

Thanks,
Ashutosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180925/20ab4f0a/attachment-0001.html>