[PATCH] D52417: [TargetTransformInfo] Pass a new argument 'Scalarized' to getMemoryOpCost.

Thu Sep 27 03:28:35 PDT 2018

jonpa updated this revision to Diff 167263.
jonpa added a comment.

It seems like a simpler solution to simply not pass the instruction pointer when making this query for a scalarized memory instruction, if VF > 1. This is equivalent, and it doesn't appear to break any tests. Theoretically, there may be use for the instruction pointer, but it seems like there isn't one at the moment.

I can imagine in the SystemZ case that if the user is also scalarized, then the scalarized load is actually folded also in the vectorized loop. I suspect this may happen for i64 multiply. This is however NFC on SPEC compared to using the 'Scalarized' parameter.

The test has an IV increment of 2, which makes it scalarized. Is this clear enough?

Do you agree this is simpler and better?


https://reviews.llvm.org/D52417

Files:
  lib/Transforms/Vectorize/LoopVectorize.cpp
  test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-1.ll


Index: test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-1.ll
===================================================================

--- /dev/null
+++ test/Transforms/LoopVectorize/SystemZ/load-scalarization-cost-1.ll
@@ -0,0 +1,26 @@
+; RUN: opt -mtriple=s390x-unknown-linux -mcpu=z13 -loop-vectorize \
+; RUN:   -force-vector-width=4 -debug-only=loop-vectorize \
+; RUN:   -enable-interleaved-mem-accesses=false -disable-output < %s 2>&1 \
+; RUN:   | FileCheck %s
+; REQUIRES: asserts
+;
+
+define i32 @fun(i64* %data, i64 %n, i64 %s, i32* %Src) {
+entry:
+  br label %for.body
+
+for.body:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
+  %acc = phi i32 [ 0, %entry ], [ %acc_next, %for.body ]
+  %gep = getelementptr inbounds i32, i32* %Src, i64 %iv
+  %ld = load i32, i32* %gep
+  %acc_next = add i32 %acc, %ld
+  %iv.next = add nuw nsw i64 %iv, 2
+  %cmp110.us = icmp slt i64 %iv.next, %n
+  br i1 %cmp110.us, label %for.body, label %for.end
+
+for.end:
+  ret i32 %acc_next
+
+; CHECK: Found an estimated cost of 4 for VF 4 For instruction:   %ld = load i32, i32* %gep
+}
Index: lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- lib/Transforms/Vectorize/LoopVectorize.cpp
+++ lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5241,6 +5241,7 @@
 
 unsigned LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
                                                                  unsigned VF) {
+  assert(VF > 1 && "Scalarization cost of instruction implies vectorization.");
   Type *ValTy = getMemInstValueType(I);
   auto SE = PSE.getSE();
 
@@ -5258,7 +5259,7 @@
 
   Cost += VF *
           TTI.getMemoryOpCost(I->getOpcode(), ValTy->getScalarType(), Alignment,
-                              AS, I);
+                              AS);
 
   // Get the overhead of the extractelement and insertelement instructions
   // we might create due to scalarization.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D52417.167263.patch
Type: text/x-patch
Size: 1967 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180927/84b9e641/attachment.bin>