[llvm-bugs] [Bug 38546] New: [LV] Vectorize loops with store to uniform addresses as profitability rather than legality check

Mon Aug 13 11:08:00 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38546

            Bug ID: 38546
           Summary: [LV] Vectorize loops with store to uniform addresses
                    as profitability rather than legality check
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: anna at azul.com
                CC: llvm-bugs at lists.llvm.org

Currently in the loop vectorizer, we prevent vectorization when we're storing
to a uniform address (loop invariant address). 

It could be the case that for some reason store promotion in LICM didn't kick
in, but vectorizing stores to uniform addresses should be a cost model decision
right? 

I think the vectorization of uniform stores should be a profitability check and
not a legality check. Is there something I'm missing here?

Today we do not vectorize a loop that contains either of the following:
1. Loop varying value stored into a loop invariant address (uniform stores)
2. Loop invariant value stored into a loop invariant address

Consider this example where we're storing the IV into a loop invariant address.
If I have this diff [A] and force-vectorize, we will vectorize the loop and the
store is a pattern of "extractelement + store", i.e. a scalarized store. Note
that with avx512 support, this just becomes a masked store, rather than a
extractelement + store.

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define i32 @consecutive_ptr_forward(i32* %a, i64 %n, i32* %b) {
entry:
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
  %tmp0 = phi i32 [ %tmp3, %for.body ], [ 0, %entry ]
  %tmp1 = getelementptr inbounds i32, i32* %b, i64 %i
  %tmp2 = load i32, i32* %tmp1, align 8
  %tmp3 = add i32 %tmp0, %tmp2
  %i.trunc = trunc i64 %i to i32
  store i32 %i.trunc, i32* %a
  %i.next = add nuw nsw i64 %i, 1
  %cond = icmp slt i64 %i.next, %n
  br i1 %cond, label %for.body, label %for.end

for.end:                                          ; preds = %for.body
  %tmp4 = phi i32 [ %tmp3, %for.body ]
  ret i32 %tmp4
}


[A] Diff allowing vectorization to uniform stores

--- a/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -754,7 +754,7 @@ bool LoopVectorizationLegality::canVectorizeMemory() {
   if (!LAI->canVectorizeMemory())
     return false;

-  if (LAI->hasStoreToLoopInvariantAddress()) {
+  if (false && LAI->hasStoreToLoopInvariantAddress()) {
     ORE->emit(createMissedAnalysis("CantVectorizeStoreToLoopInvariantAddress")
               << "write to a loop invariant address could not be vectorized");
     LLVM_DEBUG(dbgs() << "LV: We don't allow storing to uniform addresses\n");

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180813/2f4ba6b8/attachment.html>