[llvm-dev] Loop vectorization and unsafe floating point math

Thu Jun 25 05:28:56 PDT 2020

> -----Original Message-----
> From: Hal Finkel <hfinkel at anl.gov>
> Sent: den 25 juni 2020 00:27
> To: Björn Pettersson A <bjorn.a.pettersson at ericsson.com>; llvm-dev <llvm-
> dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Loop vectorization and unsafe floating point math
> 
> 
> On 6/24/20 10:21 AM, Björn Pettersson A via llvm-dev wrote:
> > Hi llvm-dev!
> >
> > We are doing some fuzzy testing using C program generators,
> > and one question that came up when generating a program with
> > both floating point arithmetic and loop pragmas was;
> > Is the loop vectorizer really allowed to vectorize a loop when
> > it can't prove that it is safe to reorder fp math, even if
> > there is a loop pragma that hints about a preferred width.
> >
> >
> > When reading here
> >
> >    https://protect2.fireeye.com/v1/url?k=6176c00e-3fd6024b-61768095-
> 8692dc8284cb-52ab55cbccb6bb5c&q=1&e=f2b4f1fd-db65-4d37-b316-
> ae4db861e5e1&u=http%3A%2F%2Fclang.llvm.org%2Fdocs%2FLanguageExtensions.ht
> ml%23extensions-for-loop-hint-optimizations
> >
> > it says " Loop hints can be specified before any loop and
> > will be ignored if the optimization is not safe to apply.".
> 
> 
> This is a good question. The statement above was written with memory
> dependence checks in mind. In this case, the lack of safety comes from
> the floating-point reassociation. Part of the problem here is the
> translation of the behavior of the compiler to the language in the
> documentation. When we say that the pragma "will be ignored", we don't
> literally mean that the compiler necessarily ignores it *statically*, we
> mean that the effect of the vectorization might be ignored *dynamically*
> in cases where vectorization might be unsafe. We do this, as you likely
> know, by multiversioning the loop, and using a memory-dependence check
> to select, during program execution, which to run.

Sure, but it won't use a vectorization factor of 543 if that can't be
applied either (it will see vectorization_width(543) as a hint and use
a different one if it can't be applied). So in some sense the pragma
is a hint (and the documentation describes them as "loop hints").

> 
> Regarding the effect of reassociation, I don't know of any efficient way
> that we might check ahead of time whether the reassociation would
> produce a different runtime result from the scalar loop. We're relying
> on the user's directive to tell the compiler that the reassociation is
> safe. An alternative design would require in the pragma some explicit
> acknowledgement of the reduction (e.g., what happens, at least in the
> specification, for OpenMP SIMD). We would want a different notation from
> the existing vectorize(assume_safety) used to disable the dependence
> checks. I'm highly sympathetic to your use case, in part because I do
> the same thing, and in part because I also work on autotuning systems
> that need the same property. However, in this case, our systems need to
> keep track of the presence of reductions. I think it's reasonable to say
> that the pragma is working as designed and we should update the
> documentation. If there's consensus here to require some kind of
> reduction acknowledgement, I'm fine with that too (although we need to
> realize that's going to cause significant regressions for existing
> users).

Maybe it is unlikely that someone wants to vectorize a loop with
floating point math unless using -ffast-math. But the loop vectorizer
is not auto-vectorizing the code unless using -ffast-math in this
case. So the legality checks are there (maybe it it pessimistic,
but nevertheless it is checked).

The problem I see is that the loop hint pragmas got a side-effect
that it turns on -ffast-math for the loop. Either we need to
document that, or one would expect that the whole program would
be compiler with -ffast-math.

I did not explicitly mention -O0 in my earlier examples, but doesn't
it feel weird that when compiling a program with vectorization hints,
with -fno-fast-math, I might get different results when executing the
program depending on if I used -O0 or -O3 when compiling.
That is actually what our test-framework were doing (comparing result
when using "-O0 -fno-fast-math " and "-O3 -fno-fast-math"). and it
ended up with failures due to loop pragmas being present in the code.

I also noticed that there are some TTI-hooks that seem to be a bit
related to this. But since both LoopVectorizeHints::allowReordering()
and LoopVectorizeHints::isPotentiallyUnsafe() are out-ruled by the
FK_Enabled hint it doesn't matter what the TTI hooks are saying.

> 
>   -Hal
> 
> 
> >
> >
> > But given this example (see also
> https://protect2.fireeye.com/v1/url?k=b7c27cca-e962be8f-b7c23c51-
> 8692dc8284cb-5b196aecb3293f6e&q=1&e=f2b4f1fd-db65-4d37-b316-
> ae4db861e5e1&u=https%3A%2F%2Fgodbolt.org%2Fz%2FfzRHsp )
> >
> > //------------------------------------------------------------------
> > //
> > //  clang -O3 -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
> >
> > #include <stdio.h>
> > #include <stdint.h>
> >
> > double v_1 = -902.30847021;
> > double v_2 = -902.30847021;
> >
> > int main()
> > {
> >
> >    #pragma clang loop vectorize_width(2) unroll(disable)
> >    for (int i = 0; i < 16; ++i) {
> >      v_1 = v_1 * 430.33975544;
> >    }
> >
> >    #pragma clang loop unroll(disable)
> >    for (int i = 0; i < 16; ++i) {
> >      v_2 = v_2 * 430.33975544;
> >    }
> >
> >    printf("v_1: %f\n", v_1);
> >    printf("v_2: %f\n", v_2);
> > }
> >
> > //
> > //------------------------------------------------------------------
> >
> >
> > we get these remarks:
> >
> >    <source>:11:3: remark: the cost-model indicates that interleaving is
> not beneficial [-Rpass-analysis=loop-vectorize]
> >    <source>:11:3: remark: vectorized loop (vectorization width: 2,
> interleaved count: 1) [-Rpass=loop-vectorize]
> >    <source>:17:15: remark: loop not vectorized: cannot prove it is safe
> to reorder floating-point operations; allow reordering by specifying
> '#pragma clang loop vectorize(enable)'
> >
> > and the result:
> >
> >    v_1: -1248356232174473978185211891975727638059679744.000000
> >    v_2: -1248356232174473819728886863447052450971779072.000000
> >
> >
> > So the second loop isn't vectorized due to unsafe reordering of fp
> math.
> > But the first loop is vectorized, even if the optimization isn't safe
> to apply.
> > And this is also reflected in that we get different result for v_1 and
> v_2.
> >
> >
> > Is this correct behavior? Should the pragma result in vectorization
> here?
> >
> > Note that we get vectorization even with "vectorize_width(3)". So
> despite
> > the fact that LV ignores the bad vectorization factor, it consider
> vectorization
> > to be "forced".
> >
> > (I also wonder if "forced" is bad terminology here, if the pragma
> should be considered as a hint.)
> >
> > Regards,
> > Björn Pettersson
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://protect2.fireeye.com/v1/url?k=2ec8d1ef-706813aa-2ec89174-
> 8692dc8284cb-73c51f5230e924ed&q=1&e=f2b4f1fd-db65-4d37-b316-
> ae4db861e5e1&u=https%3A%2F%2Flists.llvm.org%2Fcgi-
> bin%2Fmailman%2Flistinfo%2Fllvm-dev
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory