[PATCH] D22867: [LV] Mark scalarized GEPs uniform

Wed Jul 27 12:23:04 PDT 2016

mssimpso added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1518-1519
@@ -1506,4 +1517,4 @@
 
   /// Collect the variables that need to stay uniform after vectorization.
   void collectLoopUniforms();
 
----------------
anemet wrote:
> I was looking at this code recently and was surprised to see that we never actually describe what uniformity is (especially because we use it for other things than just loop-invariant addresses).  As a first step, would you mind filling this gap?
> 
> My take is that we currently call something uniform if we don't need to generate values for each horizontal value in a vector loop iteration, more precisely we only need to generate the first one.  This is true for a few things: the induction variable, loop-invariant addresses, pointers for consecutive accesses (because the vector access instruction implicitly generates the horizontal addresses).
I completely agree and don't mind doing this at all. Thanks for putting that into words!

================
Comment at: test/Transforms/LoopVectorize/induction.ll:295
@@ -251,1 +294,3 @@
+}
+
 ; Make sure that the loop exit count computation does not overflow for i8 and
----------------
wmi wrote:
> Hi Matthew,
> 
> A problem I see to make getelementptr as uniform when it is non-consecutive is:
> 
> For the testcase here, if we don't enable interleave memory access, 
> we will generate vectorized version for "%0 = shl nsw i64 %i, 2". However with your patch "%0 = shl nsw i64 %i, 2" will also be marked as uniform because "%1 = getelementptr inbounds i32, i32* %a, i64 %0" is marked as uniform. These are contradicted results.
> 
> Even if we generate scalarized version for "%0 = shl nsw i64 %i, 2", the instruction cost for "%0 = shl nsw i64 %i, 2" should be VF. Marking it as uniform will lower its cost estimation to be only 1.    
> 
> Thanks,
> Wei.
> 
> 
You're right - the GEP will only be uniform here if the loads/stores are in interleaved groups. This makes some sense to me because when interleaving we treat the pointer as if it was consecutive. Thanks! I will update the patch.


https://reviews.llvm.org/D22867