<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi,</p>
<p>The LLVM-VP extension (<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504">https://reviews.llvm.org/D57504</a>)
generalizes PatternMatch.h to match FP intrinsics as well as
regular fp (vector) instructions with the same pattern. We use
this to lift the pattern rewrites in InstSimplify and InstCombine
to predicated vector instructions. The same logic could be applied
to "scalar" constrained FP intrinsics. Hal has requested that the
VP intrinsics model fp exception/rounding too.</p>
<p>So the suggestions is to keep using fp exception/rounding mode
arguments but teaching LLVM to handle them in its optimizations
and analysis.</p>
<tt>Example</tt><br>
<tt>-----------</tt><tt><br>
</tt><br>
<tt>PatternMatch.h changes:
<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504#change-cWgJ3XBlLNvs">https://reviews.llvm.org/D57504#change-cWgJ3XBlLNvs</a></tt><br>
<tt>AddSub in code in InstCombine:
<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504#change-24P4gqRF9sNj">https://reviews.llvm.org/D57504#change-24P4gqRF9sNj</a></tt><br>
<tt>Note that "visitPredicatedFSub" will match either the regular
FSub instruction or the llvm.vp.fsub intrinsic.</tt><tt><br>
</tt>
<p><br>
</p>
<p>- Simon</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 8/20/19 7:00 PM, Serge Pavlov via
llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACOhrX5geDxVPnNtX-kQZB5UgDKS1bS=fdpbfa80ek_bHpV9AA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Hi all,<br>
<br>
During the review of <a href="https://reviews.llvm.org/D65997"
moz-do-not-send="true">https://reviews.llvm.org/D65997</a> an
issue was revealed, which relates to the decision of how
compiler should represents constrained floating point
operations.<br>
<br>
If a floating point operation requires rounding mode or
exception behavior different from the default, it should be
represented by constrained intrinsic (<a
href="http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics"
moz-do-not-send="true">http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics</a>).
An important point is that according to the current design
decision, if some part of a function contains such intrinsic,
all floating point operations in the function must be
represented by constrained intrinsics as well. Such decision
should prevent from undesired moves of fp operations. The
discussion is in the thread <a
href="http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html"
moz-do-not-send="true">http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html</a>,
the relevant example is:<br>
<br>
<blockquote style="margin:0 0 0 40px;border:none;padding:0px">double
f(double a, double b, double c) {<br>
{<br>
#pragma STDC FENV_ACCESS ON<br>
feenableexcept(FE_OVERFLOW);<br>
double d = a * b;<br>
fedisableexcept(FE_OVERFLOW);<br>
}<br>
return c * d;<br>
}</blockquote>
<br>
The second fmul must not be hoisted up to before the
fedisableexcept. Using constrained intrinsics is expected to
help in this case as they are not handled by optimization
passes.<br>
<br>
The concern is that using constrained intrinsics in a small
region of a function results in using such intrinsics everywhere
in the function including functions that inline it. As
constrained intrinsics prevent from optimizations, it can result
in performance degradation.<br>
<br>
A couple of examples:<br>
1. There is a performance critical function that makes most of
calculations in default fp mode, but in some points it enables
fp exceptions and makes an action that can trigger such
exception. Using constrained intrinsics would result in
performance loss, although the code that actually needs them is
very compact.<br>
2. Cores that are used for machine learning usually work with
short data (half, bfloat16 or even shorter). Rounding control in
this case is much more important than for big cores; using
proper rounding in different parts of algorithm can gain
precision. Constrained intrinsics is the only way to enforce
particular rounding mode. However using them results in poor
optimization, which is intolerable. In such cores rounding mode
may be encoded in instructions, so code movements cannot break
semantics.<br>
<br>
Representation of fp operations could be more flexible, so that
a user would not pay for rounding/exception control by
performance degradation. For that we need to be able to mix
constrained intrinsics and regular fp operation in a function.<br>
<br>
The question is: how can we prevent from moving fp operations
through boundaries of a region, where specific rounding and/or
exception behavior are applied? Any ideas?
<div><br>
<div>
<div dir="ltr" class="gmail_signature"
data-smartmail="gmail_signature">Thanks,<br>
--Serge<br>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Simon Moll
Researcher / PhD Student
Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31
Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>
Fax. +49 (0)681 302-3065 : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>
</body>
</html>