[PATCH] D109445: [SVE][LoopVectorize] Optimise code generated by widenPHIInstruction

Fri Sep 10 03:52:57 PDT 2021

sdesmalen accepted this revision.
sdesmalen added inline comments.
This revision is now accepted and ready to land.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4769-4771
+          // extraction of any lane. However, to generate better code, we still
+          // need to calculate values for the first n lanes since these could be
+          // required later (e.g. by a load instruction).
----------------
SjoerdMeijer wrote:
> sdesmalen wrote:
> > RosieSumpter wrote:
> > > SjoerdMeijer wrote:
> > > > sdesmalen wrote:
> > > > > SjoerdMeijer wrote:
> > > > > > sdesmalen wrote:
> > > > > > > Hi @RosieSumpter, I think it's worth elaborating a little bit more on the 'generate better code' in the comment.
> > > > > > > 
> > > > > > > [(too) long explanation here]
> > > > > > > 
> > > > > > > From what I understand, the code is better because the `extractelement` instruction that is otherwise generated (for scalar uses of this vector) may not always be folded away if the stepvector has multiple uses, leading to a redundant move (in case of element 0 for the vector-element-0 -> gpr move) or possibly expensive extractelement instructions (to extract a fixed-width lane from a scalable vector) for element > 0.
> > > > > > > 
> > > > > > > In the former case, the value for element 0 is freely available because it is the start value of the stepvector.
> > > > > > > In the latter case, there will be a cost regardless. Either the additional `add/gep` generated below to offset from the start value of the stepvector, or the extract from the stepvector itself. It's just expected that the scalar code will be cheaper.
> > > > > > > 
> > > > > > > Can you maybe capture some of that in the comment? (albeit more succinctly)
> > > > > > I had exactly the same questions as Sander. The main question I think is indeed why this is better, which it's not (that) obvious from the test changes. Thus, I was wondering, does this deserve adding some CodeGen tests?
> > > > > IMO an improved description should be sufficient.
> > > > Ok, but to be more explicit: this shows that the IR -> asm step isn't tested, is it? Why would we not test this? I think it would help too with explaining why this is better.
> > > For now I have just updated the comment (hopefully it makes sense - I have tried to add some detail but keep it concise, but am happy to change it!) 
> > > 
> > > Also happy to add a codegen test if it's decided that it's necessary.
> > > Ok, but to be more explicit: this shows that the IR -> asm step isn't tested, is it? Why would we not test this?
> > Correct, it normally doesn't happen for individual IR passes that the resulting asm for a particular target is tested for a change to the transformation, right? IMO that should be avoided whenever possible, because it defeats the purpose of a unit test.
> > 
> > In this case it should be sufficient to test explicitly that the `extractelement` does not exist after loop-vectorize. That happens with the extra `CHECK-NOT` line  in sve-widen-phi.ll. The fmov is the code-generated equivalent to the extract element of element 0, so would there be any value in also code-generating this for AArch64 to show that the fmov is removed?
> > 
> > Perhaps @RosieSumpter can add a similar comment to clarify the `CHECK-NOT` line to clarify this a bit more in the test itself?
> I was proposing an additional test, thus I was not proposing that a codegen test should replace an IR test, which would indeed be a bad idea.
> 
> This change is/was talking about codegen improvements. The IR changes here don't make that very obvious IMO. So if I was doing this work, I would add additional codegen tests with the before/after IR as input, and check its codegen to make sure these improvements are there.
> 
> Not sure if I am missing something or should be discussing adding tests here. Don't have strong opinions either, so will leave finishing the review up to you.
Thanks, I understood what you meant about adding an additional test. In this case it's more about InstCombine not folding it away because the vector has multiple uses than it is about the target not being able to do something special with it, so from my perspective adding a codegen test for this would be a bit out of place.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109445/new/

https://reviews.llvm.org/D109445