<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Sanjoy.<div class=""><br class=""></div><div class="">  Just the scheme for storage by itself isn’t. However there is a similar size constraint in FastMathMap in number of bits used/mapped for each IR, meaning we are currently reading/writing 8 flags of FMF in the bit code utilities for the 7 FMF flags plus Unsafe. New FMF flags  at this point means bigger IR in the bitcode and a bump in the version of the IR. Something vendors will have to act on as there are quite few still on current-1 IR.  We will need to be clear on communicating the divergence when it happens.</div><div class=""><br class=""></div><div class="">  On another front, IMO we should leave <b class="">nan</b> and <b class="">inf</b> processing under their respective flags and module guards, keeping the contexts separate.  It muddies too much functionality to join them in common context.</div><div class=""><br class=""></div><div class="">Regards,</div><div class="">Michael</div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 18, 2019, at 1:45 PM, Sanjoy Das <<a href="mailto:sanjoy@playingwithpointers.com" class="">sanjoy@playingwithpointers.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">Hi Michael,<br class=""><br class="">On Mon, Mar 18, 2019 at 12:23 PM Michael Berg <<a href="mailto:michael_c_berg@apple.com" class="">michael_c_berg@apple.com</a>> wrote:<br class=""><blockquote type="cite" class=""><br class="">Another thing to consider: The current bitcode reader/writer handles backwards compatibility with the previous IR version with some mapping done to preserve context.  If we change the bitcode layout we effectively have a new version of IR, bringing up the notion once more of compatibility with a prior version.<br class="">It is just another item to add to the work list...<br class=""></blockquote><br class="">That's good to keep in mind, though I don't quite understand why this<br class="">would be non-trivial.  It seems like we already have a split between<br class="">bitc::FastMathMap and FastMathFlags with an explicit encode/decode<br class="">step.  Why would a different storage scheme for FastMathFlags<br class="">influence reading/writing bitcode?<br class=""><br class="">-- Sanjoy<br class=""><br class=""><blockquote type="cite" class=""><br class="">Regards,<br class="">Michael<br class=""><br class=""><blockquote type="cite" class="">On Mar 18, 2019, at 11:02 AM, Sanjoy Das via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""><br class="">On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <<a href="mailto:craig.topper@gmail.com" class="">craig.topper@gmail.com</a>> wrote:<br class=""><blockquote type="cite" class="">Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27?<br class=""></blockquote><br class="">I like this approach because it is less work for me. :)<br class=""><br class="">But I agree with Sanjay below that this only kicks the can slightly<br class="">further down the road (solutions (2) and (3) also have the same<br class="">problem).  Let's see if we can agree on a more future proof solution.<br class=""><br class="">-- Sanjoy<br class=""><br class=""><blockquote type="cite" class=""><br class="">~Craig<br class=""><br class=""><br class="">On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""><blockquote type="cite" class=""><br class="">Hi,<br class=""><br class="">I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags,<br class="">but  we've already used up the 7 bits available in<br class="">Value::SubclassOptionalData (the "backing storage" for<br class="">FPMathOperator::getFastMathFlags()).  These are the possibilities I<br class="">can think of:<br class=""><br class="">1. Increase the size of FPMathOperator.  This gives us some additional<br class="">bits for FTZ and other fastmath flags we'd want to add in the future.<br class="">Obvious downside is that it increases LLVM's memory footprint.<br class=""><br class="">2. Steal some low bits from pointers already present in Value and<br class="">expose them as part of SubclassOptionalData.  We can at least steal 3<br class="">bits from the first two words in Value which are both pointers.  The<br class="">LSB of the first pointer needs to be 0, otherwise we could steal 4<br class="">bits.<br class=""><br class="">3. Allow only specific combinations in FastMathFlags.  In practice, I<br class="">don't think folks are equally interested in all the 2^N combinations<br class="">present in FastMathFlags, so we could compromise and allow only the<br class="">most "typical" 2^7 combinations (e.g. we could nonan and noinf into a<br class="">single bit, under the assumption that users want to enable-disable<br class="">them as a unit).  I'm unsure if establishing the most typical 2^7<br class="">combinations will be straightforward though.<br class=""><br class="">4. Function level attributes.  Instead of wasting precious<br class="">instruction-level space, we could move all FP math attributes on the<br class="">containing function.  I'm not sure if this will work for all frontends<br class="">and it also raises annoying tradeoffs around inlining and other<br class="">inter-procedural passes.<br class=""><br class=""><br class="">My gut feeling is to go with (2).  It should be semantically<br class="">invisible, have no impact on memory usage, and the ugly bit<br class="">manipulation can be abstracted away.  What do you think?  Any other<br class="">possibilities I missed?<br class=""><br class=""><br class="">Why I need an FTZ flag:  some ARM Neon vector instructions have FTZ<br class="">semantics, which means we can't vectorize instructions when compiling<br class="">for Neon unless we know the user is okay with FTZ.  Today we pretend<br class="">that the "fast" variant of FastMathFlags implies FTZ<br class="">(<a href="https://reviews.llvm.org/rL266363" class="">https://reviews.llvm.org/rL266363</a>), which is not ideal.  Moreover<br class="">(this is the immediate reason), for XLA CPU I'm trying to generate FP<br class="">instructions without nonan and noinf, which breaks vectorization on<br class="">ARM Neon for this reason.  An explicit bit for FTZ will let me<br class="">generate FP operations tagged with FTZ and all fast math flags except<br class="">nonan and noinf, and still have them vectorize on Neon.<br class=""><br class="">-- Sanjoy<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></blockquote></blockquote>_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a><br class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev<br class=""></blockquote><br class=""></blockquote></div></div></blockquote></div><br class=""></div></body></html>