[PATCH] D108699: [LAA] Analyze pointers forked by a select

Tue Oct 12 02:10:01 PDT 2021

huntergr marked 4 inline comments as done.
huntergr added inline comments.

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:698
+  if (!AR) {
+    if (RtCheck.AllowForkedPtrs) {
+      // If we can't find a single SCEVAddRecExpr, then maybe we can find two.
----------------
david-arm wrote:
> nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here:
> 
>   if (!AR) {
>     if (!RtCheck.AllowForkedPtrs)
>       return false;
>     ...
> 
> Not sure if this is better?
We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards.

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:714
+        RtCheck.ForkedPtrs[Ptr] = *FPtr;
+        if (isa<SCEVAddRecExpr>(A) && cast<SCEVAddRecExpr>(A)->isAffine() &&
+            isa<SCEVAddRecExpr>(B) && cast<SCEVAddRecExpr>(B)->isAffine())
----------------
david-arm wrote:
> Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine` here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think.
> 
We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again.

This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first.

We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users.

================
Comment at: llvm/test/Transforms/LoopVectorize/forked-pointers.ll:1
+; RUN: opt -loop-accesses -analyze -enable-new-pm=0 %s 2>&1 | FileCheck %s --check-prefix=NO-FORKED-PTRS
+; RUN: opt -disable-output -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' %s 2>&1 | FileCheck %s --check-prefix=NO-FORKED-PTRS
----------------
david-arm wrote:
> huntergr wrote:
> > david-arm wrote:
> > > Can we also have a test where at least one of the forked pointers is loop-invariant?
> > We already do; see forked_ptrs_uniform_and_contiguous_forks
> > 
> > 
> > We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now.
> Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers.
Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them.

My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc.

If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108699/new/

https://reviews.llvm.org/D108699