[PATCH] D111846: [LV] Drop NUW/NSW flags from scalarized instructions that need predication

Mon Oct 18 09:25:14 PDT 2021

fhahn added a comment.

Thanks for the patch! Some comments inline.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3070
+  // longer hold.
+  auto *Replicate = dyn_cast<VPReplicateRecipe>(Def);
+  if (Replicate && !Replicate->isPredicated() && !State.Instance &&
----------------
`Def` here should always by a `VPReplicateRecipe` I think, so you should be able to use `cast<>` instead. Or maybe even better update the function signature to pass a single `VPReplicateRecipe` reference instead both `VPValue *Def` and `VPUser &User`.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:3
+
+; Test case for PR52111. Make sure that NUW/NSW flags are dropped from
+; instructions in blocks that need predication and are linearized and masked
----------------
Can you pre-commit the test?

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:8
+; CHECK: vector.body:
+; CHECK:   %[[lane0Idx:.*]] = add i64 %index, 0
+; We shouldn't have NUW/NSW flags in the following add instruction.
----------------
It might be worth to match a bit more context here, e.g. a full triangle inside the vector body

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:13
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-linux-gnu"
+
----------------
This should not be required. If it is, the test would need to be moved to the `X86` test directory.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:16
+; Function Attrs: noinline nounwind uwtable
+define void @pr52111([1 x [33 x float]]* noalias nocapture readonly %input,
+                     [2420 x [4 x float]]* %output) local_unnamed_addr #0 {
----------------
could we just use `float *` instead of more nested types to make the test a bit simpler?

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:21
+
+loop1.header:
+  %iv1 = phi i64 [ 0, %entry ], [ %iv1.inc, %loop2.exit ]
----------------
would a single loop nest suffice or is a nested loop needed?

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:53
+
+attributes #0 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="skx" "target-features"="+adx,+aes,+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+clflushopt,+clwb,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+pku,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
Is this needed?

================
Comment at: llvm/test/Transforms/LoopVectorize/pr52111.ll:57
+!1 = !{!2}
+!2 = !{!"buffer: {index:0, offset:0, size:38720}", !3}
+!3 = !{!"Global AA domain"}
----------------
is all that metadata needed? Might be better to use the `-force-vector-width=X` option instead of metadata, as then the vectorization factor is a bit more obvious from the run line directly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111846/new/

https://reviews.llvm.org/D111846