[PATCH] D98240: [VectorCombine] Simplify to scalar store if only one element updated

Mon Apr 26 00:43:28 PDT 2021

bjope added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:784
+  StoreInst *SI = dyn_cast<StoreInst>(&I);
+  if (!SI || !SI->isSimple() || !SI->getValueOperand()->getType()->isVectorTy())
+    return false;
----------------
fhahn wrote:
> qiucf wrote:
> > fhahn wrote:
> > > I'm not sure if the vector GEP will properly handle types that are non-power-of-2 properly. Perhaps it might be good to limit this to types for which the following holds? `SI->typeSizeEqualsStoreSize(LI->getType())`?
> > Hmm.. Not sure what you mean, maybe something like
> > 
> > ```
> > %0 = getelementptr inbounds <16 x i24>, <16 x i24>* %q, i32 0, i32 3
> > store i24 %s, i24* %0, align 4
> > ```
> > 
> > this should be expected result. Or do we need to ensure scalar type size of load equals to store's value type size?
> Yes this was what I meant. The created pointer looks OK, but I think this is an area where it's worth to be extra careful. I am not too familiar with the potential problems, but I think @bjope has experience with such targets. @bjope WDYT, do you think this would work as expected?
> 
> (my preference would be to start with dis-allowing the transform if the sizes don't match initially and enable it once we got confirmation that it  is safe)
Yes, I think this would be a bit more complicated if trying to support types for with `DL.typeSizeEqualsStoreSize(SI->getType())` isn't true. Since the vectors are bit-packed you can't simply address a single vector element using a GEP otherwise. You wouldn't end up clobbering adjacent elements when doing the store (unless doing some kind of read-modify-write operation). 

Maybe you even need to look at the typeAllocSize (comparing it with the storeSize) if alignment is important (to avoid misaligned stores).

But I also wonder if using GEP:s to address individual vector elements is something we do elsewhere. I know that https://llvm.org/docs/GetElementPtr.html#can-gep-index-into-vector-elements still says that "In the future, it will probably be outright disallowed.". Has there been discussions somewhere about opening up "pandoras box" (?) and start doing such things? If so, is that well tested somewhere?

For example something like this compiles without any complaints:

```
; RUN: llc -O3 -mtriple x86_64-- -o - %s
target datalayout = "E"
define void @foo(<8 x i4>* %q, i4 %s) {
  %p = getelementptr inbounds <8 x i4>, <8 x i4>* %q, i32 0, i32 3
  store i4 %s, i4* %p
  ret void
}
```
but it ends up writing to `3(%rdi)` while the third element in that vector actually is at half of the byte at `1(%rdi)`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98240/new/

https://reviews.llvm.org/D98240