[PATCH] D122857: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays

Thu Apr 7 14:37:44 PDT 2022

congzhe added inline comments.

================
Comment at: llvm/lib/Analysis/LoopCacheAnalysis.cpp:335
+  SmallVector<int, 4> SrcSizes;
+  getIndexExpressionsFromGEP(*SE, SrcGEP, SrcSubscripts, SrcSizes);
+
----------------
nikic wrote:
> congzhe wrote:
> > congzhe wrote:
> > > nikic wrote:
> > > > Meinersbur wrote:
> > > > > nikic wrote:
> > > > > > So, I guess this is something of a pre-existing issue, but this API looks completely broken to me. Deriving information from a GEP source element type is blatantly illegal under IR semantics -- in particular, the array bounds specified therein may be completely arbitrary.
> > > > > > 
> > > > > > The correct way to derive such information is from SCEV addrec expressions. For example in your first test you have `%arrayidx10` with SCEV `{{(8 + %a)<nuw>,+,8192}<nuw><%for.body>,+,4}<nuw><%for.body4>`, and the steps of those addrecs //are// semantically meaningful.
> > > > > It should be only a heuristic, using the fact that these indices are usually something sensible from the source code. In case of DependenceInfo, there is an additional check whether the derived indices are actually legal (unless `--da-disable-delinearization-checks=true`). LoopCacheAnalysis is an heuristic analysis entirely, deriving and heuristic cost function without a claim of semantics.
> > > > > 
> > > > > The SCEV addrec approach is already used by LoopCacheAnalysis in the form of `llvm::delinearize`.
> > > > > 
> > > > Okay, this is slightly less bad than I though then -- but GEP type structure should still not be used for heuristic purposes either.
> > > > 
> > > > If the SCEV addrec approach is already used in llvm::delinearize(), then why is this additional code needed? The addrecs involved in the motivational test cases contain all the necessary information, so there should be no need to look at GEP structure.
> > > Thanks for the comment! The current SCEV addrec approach that is already used in llvm::delinearize() could not handle fixed-size arrays, it could only handle parametric-size arrays. That is why we treat delinearization of fixed-size and parametric-size arrays separately in `DependenceAnalysis`, and I followed the same fashion here in `LoopCacheAnalysis`. 
> > Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore?
> > Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore?
> 
> That's part of it, but also more generally that you're relying on the frontend encoding things in a specific way here. Maybe the array is actually wrapped inside a struct. Maybe the array decays into a slice (`[0 x ...]` style) or a pointer. Maybe the user is an old-school C programmer and uses `ary[y * SIZE + x]` instead of `ary[y][x]`. Matching GEP structure locks you into matching a very specific code pattern, and does not support code that is semantically equivalent, but represented slightly differently in IR.
> 
> You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.
Thanks for your explanation on GEP type structures.

Currently the SCEV-based approach is more like a symbolic technique and if it does not find any parameters (symbols) in the SCEV expression, it just bails out. I'm looking into whether we could incorporate fixed-size array delinearization in the the SCEV-based approach and trying to prototype a solution if possible. I will get back to this thread once I get some results.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122857/new/

https://reviews.llvm.org/D122857