[PATCH] D111846: [LV] Drop NUW/NSW flags from scalarized instructions that need predication

Tue Oct 19 11:00:24 PDT 2021

dcaballe added a comment.

In D111846#3072117 <https://reviews.llvm.org/D111846#3072117>, @lebedev.ri wrote:

> Can you explain why this isn't applicable to FP fast-math flags also?

The use cases that I've seen naturally happening are related to memory address computation. For those cases, I think FP instructions wouldn't make sense, unless we create some artificial IR with address computation in FP that is then converted to integers. For more generic cases, of course, we could create artificial/non-optimized IR that would expose the problem on FP but I don't see how that IR would make their way to LV with the current pipelines. If you can provide a test case, I'll be happy to address it in a separate patch. I don't think we should block this patch until we cover all the potential cases because the problem could be more complex that we even know right now.  For example, linearization of the control flow might also impact `__builtin_assume` (not sure if we are preserving those after vectorization) and other metadata guarded by a predicate. These generic cases can be addressed incrementally by adding the corresponding logic to `clearUnsafeFlagsAfterLinearizingCF` as we see fit.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3070
+  // longer hold.
+  auto *Replicate = dyn_cast<VPReplicateRecipe>(Def);
+  if (Replicate && !Replicate->isPredicated() && !State.Instance &&
----------------
fhahn wrote:
> `Def` here should always by a `VPReplicateRecipe` I think, so you should be able to use `cast<>` instead. Or maybe even better update the function signature to pass a single `VPReplicateRecipe` reference instead both `VPValue *Def` and `VPUser &User`.
Yeah, it makes sense. Thanks!

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:8
+; CHECK: vector.body:
+; CHECK:   %[[lane0Idx:.*]] = add i64 %index, 0
+; We shouldn't have NUW/NSW flags in the following add instruction.
----------------
fhahn wrote:
> It might be worth to match a bit more context here, e.g. a full triangle inside the vector body
I'm matching more instructions but note that there is no triangle (if I understand you correctly) since the control flow is linearized.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:13
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-linux-gnu"
+
----------------
fhahn wrote:
> This should not be required. If it is, the test would need to be moved to the `X86` test directory.
It is since there's no flag (?) to force masked vectorization. I need to target AVX512. Otherwise, the loop is vectorized with predicates, which doesn't expose the problem.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:53
+
+attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="skx" "target-features"="+adx,+aes,+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+clflushopt,+clwb,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+pku,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
dcaballe wrote:
> fhahn wrote:
> > Is this needed?
> The problem is that this loop was only vectorized when targeting AVX512. The loop might be vectorized differently if I remove some of these flags. Let me try to simplify all of this and see what happens. 
I only left the avx512 metadata

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:57
+!1 = !{!2}
+!2 = !{!"buffer: {index:0, offset:0, size:38720}", !3}
+!3 = !{!"Global AA domain"}
----------------
fhahn wrote:
> is all that metadata needed? Might be better to use the `-force-vector-width=X` option instead of metadata, as then the vectorization factor is a bit more obvious from the run line directly.
Removed all the metadata

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111846/new/

https://reviews.llvm.org/D111846