[PATCH] D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.

Fri Aug 3 19:47:20 PDT 2018

Meinersbur added a comment.

In https://reviews.llvm.org/D49281#1187583, @dmgreen wrote:

> Some extra tests for nonforced + a pragma would be good to see.

Any transformations in particular?

In https://reviews.llvm.org/D49281#1188026, @hsaito wrote:

> In https://reviews.llvm.org/D49281#1187583, @dmgreen wrote:
>
> > I'm not much of an expert on the vectoriser changes here.
>
>
> There is a call for a vectorizer person, and here I am,

Thank you!!!

> but before going there, ever since this patch was uploaded, I've been thinking whether the original intent of this patch can be really accomplished ---- especially so with the set of "hints" being used here. So, I'd like to go back and start there if that's okay. I think this patch is in some sense based on the optimism of "programmer won't abuse this and transformation will happen". Reality is more like programmer will try using those pragmas to arm-twist the compiler to get the set of transformations he/she wants w/o thinking deep enough about what happens at each step of the way. As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens. From what I've read --- granted that I haven't gone through this very deep, the fall back aspect isn't handled well. If we don't start from "programmer specified transformation may fail to kick-in", providing this feature to the programmers would quickly backfire and we'll get tons of this doesn't work that doesn't work problem reports ---- which is a big mess/disaster. If we are doing this for research purposes, that may be fine. I'm looking at this from a production compiler development perspective.
> 
> There are two different approaches we can think of.
> 
> 1. Start from defining the transformation directives that actually transforms in most cases. For example, Intel compiler's implementation of OpenMP SIMD is in this position, and we are trying to bring the same position to LLVM LV. Then, failing to transform is a compiler bug (or a programmer has a bug in the source code to fix). OpenMP SIMD is defined in such a way for compilers to be able to take this positioning. In this approach, we can stop processing transformation directives after the first failed transformation. Even in this case, for example, programmer probably won't know under certain circumstances, vectorizer won't produce remainder loop, or under certain cases, vectorizer completely unrolls the loop so that there aren't any loops left after vectorization. So, controlling a single transformation in a programmer predictable manner --- enough to describe what should happen next --- is a big task.
> 2. Based on transformation hint directives. Since what the programmer is using is a hint, for each transformation, programmer needs to tell the next step 1) if the transformation kicks in and also 2) if the transformation does not kick-in. This will be very messy and I'm not sure how practical it would be for the programmer to specify the behaviors for all situations.
> 
>   If these kinds of stuff have been already discussed, please give me the pointers and I'll try to digest. Else, we can talk about this in this review or in llvm-dev. Hope my argument makes sense to you/others, but please feel free to ask for clarification. I've been working on making SIMD directive programmer predictable for many years. So, I may be skipping some explanations that became natural enough to me over the years.

All these issues apply to the current loop metadata/#pragma clang loop as well. A user can specify `#pragma clang loop distribute(enable) vectorize_width(4)` with the expectation that these will be carried out. If not, this can be a bug (or semantically incorrect) just as if the distribution/vecorization was explicitly orderder using followup-attributes.
In contrast, currently there is no way to even express that the transformations should be carried out in there reverse order (first vectorization, then distribution; let's ignore for the moment whether this makes sense, it definitely makes sense with loop transformations other than those for which LLVM currently has metadata). Our longer-term plans are to support this in LLVM and the first step is to make sequences of transformations expressible in IR. Of course our goal is also to make transformations more applicable/robust. I see these as two orthogonal problems.

I don't think we need a fallback loop transformation. If a transformation cannot be applied, the user's reactions is probably not "let's do a different transformation then" (which will in most cases also fail for the same reasons the primary transformation failed), but "unfortunately the compiler cannot do my transformation I need to get the best performance, will implement it by hand then." Even if the first option is what the programmer wantsm they will implement it using preprocessor switches as it is common today with different compilers that support/do not support specific pragmas.

I had already some discussions on the reliability of transformations in the compiler. Some groups do not want to rely at all on the compiler being able to do a specific optimization and user libraries instead. Those libraries will 'miscompile' the input if certain preconditions are not met which is the desired outcome where slow execution just is no option. With incorrect results it at least becomes obvious that there is a problem. However, this is not an option for a compiler where correctness is the most important aspect (and only relaxed by attributes such as `assume_safety`). For some use cases, such as autotuning, being able to rely on the compiler producing correct output makes it possible in first place.
We could argue about whether we want high-level transformations in a low-level compiler in the first place. I think this has been answered a long time ago: LoopVectorizer, LoopUnroll, LoopInterchange, LoopDistribution, LoopUnswitch, etc. Since we have this kind of transformation, why not mkaing them as good as possible?

> In short, doing this even for one transformation (in my case SIMD) is difficult enough.

I'd be happy to extract-out attributes for specific transformations in separate reviews. However, to be come useful, I think we need the entire set.

> If we are trying to expand to multiple transformations, we should try doing so in baby steps.

I understood that you are working on making the loop vectorizer predictable, i.e. we are working towards the same goal.

> The idea behind this is so powerful such that even if we start from "best effort basis", programmers will quickly jump on and say make this more robust/predictable. We'd rather spend time to design this as a robust/predictable feature from the beginning than having to work on it under the customer pressure.

I think the current one-pass-per-transformation is indeed very fragile and I am working on something that should be more robust.

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:753
+  if (!PragmaEnableUnroll && hasDisableAllTransformsHint(L))
+    return false;
   bool ExplicitUnroll = PragmaCount > 0 || PragmaFullUnroll ||
----------------
dmgreen wrote:
> What if it has PragmaCount and hasDisableAllTransformsHint? Should that not enable too?
If `llvm.loop.unroll.enable` is not set, interpretation here is that the transformation is 'non-forced', that is, `llvm.loop.unroll.count` is a hint to the compiler that if it unrolls, then it should unroll by that amount. `llvm.loop.disable_nonforced` overrride the decision whether to unroll, i.e. the unroll factor does not matter.

I am aware that the concept of 'forced' transformations is not consistent between passes, but I am trying to give it some consistency. Passes could query shared code such as `hasUnrollTransformation` in `LoopUtils`. `hasUnrollTransformation` currently follows your interpretation of `llvm.loop.unroll.count` / `llvm.loop.disable_nonforced`. I am happy to implement either definition, as long as we find a consistent rule.

================
Comment at: test/Transforms/LoopUnroll/unroll-pragmas_transform.ll:1
+; RUN: opt < %s -loop-unroll -pragma-unroll-threshold=1024 -S | FileCheck -check-prefixes=CHECK,REM %s
+; RUN: opt < %s -loop-unroll -loop-unroll -pragma-unroll-threshold=1024 -S | FileCheck -check-prefixes=CHECK,REM %s
----------------
dmgreen wrote:
> This file look sensible on it's own and I think looks OK to be committed separately. (Apart from the nit below)
This is a copy of `unroll-pragmas.ll` and any ambiguous metadata replaced by follow-up attributes.

An hope is to generally make 'multiple transformation attributes on the same loop' illegal and rejected by the IR verifier (since the result depends on an implementation detail: the order in the pass manager). In this case this file would replace `unroll-pragmas.ll`.

But my expectation is that we cannot break backwards-compatibility this way.

================
Comment at: test/Transforms/LoopUnroll/unroll-pragmas_transform.ll:5
+;
+; Run loop unrolling twice to verify that loop unrolling metadata is properly
+; removed and further unrolling is disabled after the pass is run once.
----------------
dmgreen wrote:
> Nit: Is this sentence still true?
Yes: When no follow-up attributes are specified, the default ones are added (here: `llvm.loop.unroll.disable` to disable further  unrolling). In case there are follow-up attribute lists, there is no default and the transformation-disabling must be added explicitly (MDNode `!18`) and of course added after unrolling and recognized by the second LoopUnroll.

Repository:
  rL LLVM

https://reviews.llvm.org/D49281