[PATCH] D37702: [LV] Clamp the VF to the trip count

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 11 14:05:10 PDT 2017


Ayal added inline comments.


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6166
   if (!OptForSize) // Remaining checks deal with scalar loop when OptForSize.
-    return computeFeasibleMaxVF(OptForSize);
+    return computeFeasibleMaxVF(OptForSize, TC);
 
----------------
Better remove the default setting of TC = 0 in the declaration of computeFeasibleMaxVF(), which becomes dead.


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6179
   // If we optimize the program for size, avoid creating the tail loop.
-  unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
   DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
 
----------------
Probably good to hoist this DEBUG line along with computing TC earlier.


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6258
     SmallVector<unsigned, 8> VFs;
     unsigned NewMaxVectorSize = WidestRegister / SmallestType;
     for (unsigned VS = MaxVectorSize; VS <= NewMaxVectorSize; VS *= 2)
----------------
While we're at it, in this or a follow-up patch:
  - Suggest to also clamp NewMaxVectorSize above to TC.
  - VS below should start from MaxVectorSize*2 instead of MaxVectorSize; it's pointless and wasteful to calculateRegisterUsage for MaxVectorSize, as the latter will be returned by default anyway.
  - Optimize calculateRegisterUsage() for the case where VFs is empty.

Doing this will also take care of your early exit above.


================
Comment at: test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll:3
 ; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -mcpu=core-avx2 -debug-only=loop-vectorize -S < %s 2>&1 | FileCheck %s --check-prefix=CHECK-AVX2
+; RUN: opt -loop-vectorize -vectorizer-maximize-bandwidth -mcpu=core-avx2 -S < %s | FileCheck %s --check-prefix=CHECK-CLAMP
 ; REQUIRES: asserts
----------------
You could reuse CHECK-AVX2 instead of adding CHECK-CLAMP, right?


https://reviews.llvm.org/D37702





More information about the llvm-commits mailing list