[PATCH] D32422: LoopVectorizer: let target prefer scalar addressing computations (+ minor improvements in SystemZTTI)

Sat May 20 23:28:01 PDT 2017

jonpa added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:7301
+
+  // Start with all scalar pointer uses.
+  SmallPtrSet<Instruction *, 8> AddrDefs;
----------------
jonpa wrote:
> delena wrote:
> > jonpa wrote:
> > > delena wrote:
> > > > But interleave loads are also here. You don't need to scalarize their pointers.
> > > > attempt to scalarize the Ptr should be done if the taken decision is "Scalarize".
> > > I tried
> > > 
> > > ```
> > >       if (PtrDef && TheLoop->contains(PtrDef) &&
> > >           getWideningDecision(&I, VF) == CM_Scalarize)
> > >         AddrDefs.insert(PtrDef);
> > > ```
> > > 
> > > This was a minor change: only 17 loops became different. This seems to relate to the cases of mixed context (used both as address and otherwise), which means that we are instead of extracting an address now do inserts instead.
> > > 
> > > Not sure exactly if this matters, but for what I can see, it seemed slightly better in terms of final MIs to scalarize "anything" (not just scalarized case), perhaps because the big benefit of LSR working on scalar addresses?
> > > 
> > > If you don't have a strong opinion, I guess I would be most happy not to change anything here.
> > > 
> > I just do not understand  why do you need to scalarize addressing if the instruction itself is not scalarized.
> > Can you give me an IR example?
> Again, "This seems to relate to the cases of mixed context (used both as address and otherwise)"...
> 
> For example:
> 
> ```
> for.body.i:                                       ; preds = %for.body.i, %for.body.preheader.i
>   %indvars.iv241.i = phi i64 [ %47, %for.body.preheader.i ], [ %indvars.iv.next242.i, %for.body.i ]
>   %indvars.iv.next242.i = add nsw i64 %indvars.iv241.i, -1
>   %arrayidx.i.i = getelementptr inbounds i32, i32* %42, i64 %indvars.iv.next242.i
>   %48 = trunc i64 %indvars.iv.next242.i to i32
>   store i32 %48, i32* %arrayidx.i.i, align 4, !tbaa !47
>   %tobool.i = icmp eq i32 %48, 0
>   br i1 %tobool.i, label %for.end.i.loopexit, label %for.body.i
> ```
> 
> This is a mixed-use where the IV is stored truncated at the address indexed with IV. Due to the non-address use, it will be contained in a vector.
In this example, it is the same instruction (store) using the value both as addressing and as a general value. There are also other cases where it may be used in a store, but used in a different instruction also, like an arithmetic one...

I would like to point out that this actually wanted right now since performance measurements show better results on SystemZ at least on one benchmark by doing this "as much as possible", meaning that any mixed use like this gets included in this transformation.  (I have actually tried this version of the patch and compared to a less aggressive version, which do not touch mixed-uses, and it was found that more aggressive is better...)

https://reviews.llvm.org/D32422