<div dir="ltr">Is there anything that means, in particular, "go fast, even if it means not all bits are significant"?<div><br></div><div>I'm currently working on an llvm-based compiler for a GPU that is optomised for OpenGL, where 16 bit FP may not be quite accurate enough (or may be in some cases), but 32 bit FP is overkill. A lot of the fast, built in, operations end up with a few junk bits at the end (not add/sub/mul . but divide is available *only* using reciprocal).</div><div><br></div><div>When implementing OpenCL, the specs and conformance tests require full IEEE accuracy. In some cases this requires a round of Newton-Raphson to clean up the accuracy, which is a significant though maybe not crippling performance penalty. But in other cases we need to do a lot of range reduction, some polynomial, and then generalise the result again. This can be an order of magnitude or more slower than using the not-quite-accurate-enough built in instruction.</div><div><br></div><div>The OpenCL spec defines a number of compile flags controlling optimizartions. Some seem to map well onto the flags already discussed here:</div><div><br></div><div>-cl-mad-enable<br></div><div>-cl-no-signed-zeros<br></div><div>-cl-finite-math-only<br></div><div><br></div><div>However it looks to me that the following ones don't presently map well to LLVM:</div><div><br></div><div><div>-cl-unsafe-math-optimizations</div><div>Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid, (b) may violate IEEE 754 standard and (c) may violate the OpenCL numerical compliance requirements as defined in the SPIR-V OpenCL environment specification for single precision and double precision floating-point, and edge case behavior in the SPIR-V OpenCL environment specification. This option includes the -clno-signed-zeros and -cl-mad-enable options.</div></div><div><br></div><div><div>-cl-fast-relaxed-math</div><div>Sets the optimization options -cl-finite-math-only and -cl-unsafe-math-optimizations. This allows optimizations for floating-point arithmetic that may violate the IEEE 754 standard and the OpenCL numerical compliance requirements for single precision and double precision floating-point, as well as floating point edge case behavior. This option also relaxes the precision of commonly used math functions. This option causes the preprocessor macro __FAST_RELAXED_MATH__ to be defined in the OpenCL program. The original and modified values are defined in the SPIR-V OpenCL environment specification</div></div><div><br></div><div>I'd like to emphasise in the latter one: "This option also relaxes the precision of commonly used math functions."</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 2, 2017 at 4:45 PM, Ristow, Warren via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-US" link="blue" vlink="purple">
<div class="m_-6162699180708653109WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">I'm not aware of any additional bits needed. But putting us right at the edge leaves me uncomfortable. So an implementation that isn't limited by the 7 bits
in SubclassOptionalData seems sensible.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">Thanks,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a">-Warren<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#44546a"><u></u> <u></u></span></p>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Sanjay Patel [mailto:<a href="mailto:spatel@rotateright.com" target="_blank">spatel@rotateright.com</a><wbr>]
<br>
<b>Sent:</b> Monday, October 2, 2017 12:06 AM<br>
<b>To:</b> Ristow, Warren<br>
<b>Cc:</b> Hal Finkel; <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<b>Subject:</b> Re: [llvm-dev] Trouble when suppressing a portion of fast-math-transformations<u></u><u></u></span></p>
</div><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">Are we confident that we just need those 7 bits to represent all of the relaxed FP states that we need/want to support?
<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">I'm asking because FMF in IR is currently mapped onto the SubclassOptionalData of Value...and we have exactly 7 bits there. :)<u></u><u></u></p>
</div>
<p class="MsoNormal">If we're redoing the definitions, I'm wondering if we can share the struct with the backend's SDNodeFlags, but that already has one extra bit for vector reduction. Should we give up on SubclassOptionalData for FMF? We have a MD_fpmath enum
value for metadata, so we could move things over there?<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Fri, Sep 29, 2017 at 8:16 PM, Ristow, Warren via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">Hi Hal,<br>
<br>
>> 4. To fix this, I think that additional fast-math-flags are likely<br>
>> needed in the IR. Instead of the following set:<br>
>><br>
>> 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<br>
>><br>
>> something like this:<br>
>><br>
>> 'reassoc' + 'libm' + 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract'<br>
>><br>
>> would be more useful. Related to this, the current 'fast' flag which acts<br>
>> as an umbrella (enabling 'nnan' + 'ninf' + 'nsz' + 'arcp' + 'contract') may<br>
>> not be needed. A discussion on this point was raised last November on the<br>
>> mailing list:<br>
>><br>
>> <a href="http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html" target="_blank">
http://lists.llvm.org/<wbr>pipermail/llvm-dev/2016-<wbr>November/107104.html</a><br>
><br>
> I agree. I'm happy to help review the patches. It will be best to have<br>
> only the finer-grained flags where there's no "fast" flag that implies<br>
> all of the others.<br>
<br>
Thanks for the quick response, and for the willingness to review. I won't let<br>
this languish so long, like the post from last November.<br>
<br>
Happy to hear that you feel it's best not to have the umbrella "fast" flag.<br>
<br>
Thanks again,<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">-Warren<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><u></u><u></u></p>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div></div></div>
</div>
<br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>