[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.
    Simon Pilgrim via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Thu Dec  1 01:42:54 PST 2016
    
    
  
RKSimon added inline comments.
================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:20
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i32>, <8 x i32>* [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[TMP3:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP2]]
----------------
mkuper wrote:
> RKSimon wrote:
> > What can be done to avoid this regression?
> Ohh, right, wanted to ask about this as well.
> My guess is that this wasn't actually a regression, but we moved the shuffle from store side to the load side.  Is that right?
If the update_test_checks script has done its job and generated checks for all the IR then this is an additional shuffle, I can't see an equivalent shuffle or set of extracts in the codegen on the left.
https://reviews.llvm.org/D26905
    
    
More information about the llvm-commits
mailing list