[PATCH] D98240: [VectorCombine] Simplify to scalar store if only one element updated

Fri Apr 23 06:59:46 PDT 2021

fhahn added a subscriber: bjope.
fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:784
+  StoreInst *SI = dyn_cast<StoreInst>(&I);
+  if (!SI || !SI->isSimple() || !SI->getValueOperand()->getType()->isVectorTy())
+    return false;
----------------
qiucf wrote:
> fhahn wrote:
> > I'm not sure if the vector GEP will properly handle types that are non-power-of-2 properly. Perhaps it might be good to limit this to types for which the following holds? `SI->typeSizeEqualsStoreSize(LI->getType())`?
> Hmm.. Not sure what you mean, maybe something like
> 
> ```
> %0 = getelementptr inbounds <16 x i24>, <16 x i24>* %q, i32 0, i32 3
> store i24 %s, i24* %0, align 4
> ```
> 
> this should be expected result. Or do we need to ensure scalar type size of load equals to store's value type size?
Yes this was what I meant. The created pointer looks OK, but I think this is an area where it's worth to be extra careful. I am not too familiar with the potential problems, but I think @bjope has experience with such targets. @bjope WDYT, do you think this would work as expected?

(my preference would be to start with dis-allowing the transform if the sizes don't match initially and enable it once we got confirmation that it  is safe)

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:809
+    StoreInst *NSI = Builder.CreateStore(NewElement, GEP);
+    NSI->copyMetadata(*SI);
+    replaceValue(I, *NSI);
----------------
fhahn wrote:
> I think we need to set the alignment here. For example, the original store could have a alignment less than the default for the type.
Perhaps I missed it, but could you add a test that specifies a smaller alignment for a store (`!align 1`) for a vector of `i16` or larger?

================
Comment at: llvm/test/Transforms/InstCombine/load-insert-store.ll:1
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt -S -instcombine < %s | FileCheck %s
----------------
qiucf wrote:
> fhahn wrote:
> > I'm not sure why this has been removed?
> It's moved from `InstCombine` to `VectorCombine`, not deleted.
Oh right, so when it was originally added the plan was to optimize the pattern in instcombine? Might be worth moving the file in a separate commit and then just have the diff here show the changes by this patch,.

================
Comment at: llvm/test/Transforms/VectorCombine/X86/load-insert-store.ll:15
+  %vecins = insertelement <16 x i8> %0, i8 %s, i32 3
+  store <16 x i8> %vecins, <16 x i8>* %q
+  ret void
----------------
qiucf wrote:
> fhahn wrote:
> > can you also add a test with element types > i8?
> Do you mean `insertelement <16 x i8> %0, i32 %s, i32 2`? If not, there's a test for `<8 x i16>` below.
yep I missed that one, sorry.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98240/new/

https://reviews.llvm.org/D98240