[PATCH] D111846: [LV] Drop integer poison-generating flags from instructions that need predication

Sun Nov 7 02:30:04 PST 2021

dcaballe added a comment.

Thanks for the feedback, Florian! I added some comments. Working on some of the changes while waiting for some clarifications. Thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:888
+  /// after vectorization, even when their operands are not poison.
+  SmallPtrSet<VPReplicateRecipe *, 16> MayGeneratePoisonRecipes;
 };
----------------
fhahn wrote:
> This is only used during codegen, right? In that case, it would probably be better to just add it to VPTransformState instead. Otherwise it just adds a little bit of additional coupling between ILV and codegen, making it harder to separate in the future.
Sure, will do!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1199
+    SmallVector<VPDef *, 16> Worklist;
+    SmallPtrSet<VPDef *, 16> Visited;
+    Worklist.push_back(Root);
----------------
nlopes wrote:
> this visited set can be hoisted outside the lambda. You should only traverse each instruction at most once, as you need to drop poison flags if the instructions contributes to any address (doesn't matter which)
It makes sense. Will do.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1223
+            State.VF.isVector() &&
+            Legal->blockNeedsPredication(Instr->getParent()) &&
+            cast<Operator>(Instr)->hasPoisonGeneratingFlags())
----------------
fhahn wrote:
> Is there a way to avoid querying information from legal here? 
> 
> As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have `hasMask`/`getMask` helpers)? Then we would not need to check legal here, right? 
> 
> Checking `Instrs` location here might not be sufficient in any case I think. There could be cases where `Instr` is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from `Instr`. In any case, it might be good to add such a test case.
> As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have hasMask/getMask helpers)? Then we would not need to check legal here, right?

I had tried using `getMask` already. Unfortunately, there are cases where the loads are predicated but the mask has been dropped in favor of executing them speculatively unmasked. See https://reviews.llvm.org/D66688 and test `load-deref-pred.ll`. We should drop the flags from these cases as well and the only way I found is using `Legal->blockNeedsPredication`. I know it's not ideal but `Legal` is still widely used during codegen.

> Checking Instrs location here might not be sufficient in any case I think. There could be cases where Instr is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from Instr. In any case, it might be good to add such a test case.

I think that makes sense. It would a case similar to the one that exposed the bug but moving the `subi` instruction outside the condition. I'll give it a try!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1229
+      // Add new definitions to the worklist.
+      for (VPValue *operand : cast<VPRecipeBase>(CurDef)->operands())
+        if (VPDef *OpDef = operand->getDef())
----------------
fhahn wrote:
> Do we in some cases also need to drop flags from recipes other then VPReplicateRecipe?
> 
> If we need to drop flags from recipes in unpredicted blocks (see comment above). then we could have cases where a vector operation (e.g. VPWidenRecipe) feeds the address computation (the first lane will get extracted). An example could be
> 
> ```
> define void @drop_vector(float* noalias nocapture readonly %input,
>                                  float* %output, i64* noalias %ints) local_unnamed_addr #0 {
> entry:
>   br label %loop.header
> 
> loop.header:
>   %iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]
>   %i23 = icmp eq i64 %iv, 0
>   %gep = getelementptr inbounds i64, i64* %ints, i64 %iv
>   %lv = load i64, i64* %gep
>   %i27 = sub nuw nsw i64 %iv, 1
>   %r = add i64 %lv, %i27
>   store i64 %r, i64* %gep
>   br i1 %i23, label %if.end, label %if.then
> 
> if.then:
>   %i29 = getelementptr inbounds float, float* %input, i64 %i27
>   %i30 = load float, float* %i29, align 4, !invariant.load !0
>   br label %if.end
> 
> if.end:
>   %i34 = phi float [ 0.000000e+00, %loop.header ], [ %i30, %if.then ]
>   %i35 = getelementptr inbounds float, float* %output, i64 %iv
>   store float %i34, float* %i35, align 4
>   %iv.inc = add nuw nsw i64 %iv, 1
>   %exitcond = icmp eq i64 %iv.inc, 4
>   br i1 %exitcond, label %loop.exit, label %loop.header
> 
> loop.exit:
>   ret void
> }
> attributes #0 = { noinline nounwind uwtable "target-features"="+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl" }
> 
> !0 = !{}
> 
> ```
I assumed that address computation would be uniform for those cases and left scalar, wouldn't it?. I'll give it a try but there has been some back and forth wrt dropping the flags from vector instructions already. Based on previous feedback, I'm not sure if we should drop the flags from them when there are lanes that won't generate poison. I would need some clarification before proceeding with those changes (@nlopes). 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111846/new/

https://reviews.llvm.org/D111846