[PATCH] D119078: [LAA, LV] Add initial support for pointer-diff memory checks.

Thu May 26 06:41:03 PDT 2022

fhahn marked an inline comment as done.
fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3090
+  // across all iterations.
+  if (!Legal->getLAI()->getRuntimePointerChecking()->getDiffChecks()) {
+    //  We currently don't use LoopVersioning for the actual loop cloning but we
----------------
bjope wrote:
> Hi @fhahn and @dmgreen (and well, anyone else who might happen to read this).
> 
> I've been trying to understand some regressions that we've seen downstream after this patch.
> It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here).
> 
> What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well.
> 
> So far I haven't figured out what to do downstream in this situation.
> 
> Maybe we should look into the SWP scheduler to see if it can deduce "no alias" in some other way (I'm not sure, but I figure SWP isn't requiring no overlap across all iterations, but depending on how much pipelining it might require no overlap across iteration N and N+1 etc.).
> 
> Maybe we should add some heuristic already in the LoopVectorizer to not use the new kind of memory checks when we think that it would block SWP (an initial heuristic would probably be to use the old kind of checks for out target).
> Here I'm not quite sure about the plans in-tree for this. Are the new memory checks supposed to replace the old checks in the future?
> 
> If anyone has some insight/ideas here, then I'd be happy to read your comments on this.
> It seems to be related to this part, where the new memchecks somehow are weaker(?) in the sense that we can't deduce NoAlias across all iterations (well, that is how I've interpreted the diff here).

Yes exactly, the cheaper check only rules out dependences for the selected VF * IC, so no alias is not guaranteed for all accesses in the loop.

> What we've seen happening for a couple of benchmarks is that in our OOT backend some vectorized loops aren't software pipelined any longer. And the SWP scheduler is bailing out since noalias isn't guaranteed. No SWP => quite huge regressions. No idea if this could be a problem for other targets as well.

That seems like an unfortunate side effect from this patch, but in a way SWP got 'lucky' earlier because the dependence checks by LV were checking more than is required for vectorization. 

If pipelining is profitable but requires runtime checks, then ideally the software pipeliner would emit them (and replace the LV checks with the stricter checks). If you are talking about `MachinePipeliner`, which runs on machine-functions, this is likely going to be very difficult unfortunately.

A more crude solution would be to introduce a TTI hook to opt-out of the more lightweight checks. The drawback here is that the backend misses out on cases where the lightweight checks are sufficient because no pipelining is happening.

If it is enough to rule out no-aliasing for the pipelined iterations another option might be for LV to emit the difference checks with a slightly larger distance.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119078/new/

https://reviews.llvm.org/D119078