[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

Mon Mar 18 13:45:23 PDT 2019

Hi Michael,

On Mon, Mar 18, 2019 at 12:23 PM Michael Berg <michael_c_berg at apple.com> wrote:
>
> Another thing to consider: The current bitcode reader/writer handles backwards compatibility with the previous IR version with some mapping done to preserve context.  If we change the bitcode layout we effectively have a new version of IR, bringing up the notion once more of compatibility with a prior version.
> It is just another item to add to the work list...

That's good to keep in mind, though I don't quite understand why this
would be non-trivial.  It seems like we already have a split between
bitc::FastMathMap and FastMathFlags with an explicit encode/decode
step.  Why would a different storage scheme for FastMathFlags
influence reading/writing bitcode?

-- Sanjoy

>
> Regards,
> Michael
>
> > On Mar 18, 2019, at 11:02 AM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >
> > On Sun, Mar 17, 2019 at 1:47 PM Craig Topper <craig.topper at gmail.com> wrote:
> >> Can we move HasValueHandle out of the byte used for SubClassOptionalData and move it to the flags at the bottom of value by shrinking NumUserOperands to 27?
> >
> > I like this approach because it is less work for me. :)
> >
> > But I agree with Sanjay below that this only kicks the can slightly
> > further down the road (solutions (2) and (3) also have the same
> > problem).  Let's see if we can agree on a more future proof solution.
> >
> > -- Sanjoy
> >
> >>
> >> ~Craig
> >>
> >>
> >> On Sat, Mar 16, 2019 at 12:51 PM Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags,
> >>> but  we've already used up the 7 bits available in
> >>> Value::SubclassOptionalData (the "backing storage" for
> >>> FPMathOperator::getFastMathFlags()).  These are the possibilities I
> >>> can think of:
> >>>
> >>> 1. Increase the size of FPMathOperator.  This gives us some additional
> >>> bits for FTZ and other fastmath flags we'd want to add in the future.
> >>> Obvious downside is that it increases LLVM's memory footprint.
> >>>
> >>> 2. Steal some low bits from pointers already present in Value and
> >>> expose them as part of SubclassOptionalData.  We can at least steal 3
> >>> bits from the first two words in Value which are both pointers.  The
> >>> LSB of the first pointer needs to be 0, otherwise we could steal 4
> >>> bits.
> >>>
> >>> 3. Allow only specific combinations in FastMathFlags.  In practice, I
> >>> don't think folks are equally interested in all the 2^N combinations
> >>> present in FastMathFlags, so we could compromise and allow only the
> >>> most "typical" 2^7 combinations (e.g. we could nonan and noinf into a
> >>> single bit, under the assumption that users want to enable-disable
> >>> them as a unit).  I'm unsure if establishing the most typical 2^7
> >>> combinations will be straightforward though.
> >>>
> >>> 4. Function level attributes.  Instead of wasting precious
> >>> instruction-level space, we could move all FP math attributes on the
> >>> containing function.  I'm not sure if this will work for all frontends
> >>> and it also raises annoying tradeoffs around inlining and other
> >>> inter-procedural passes.
> >>>
> >>>
> >>> My gut feeling is to go with (2).  It should be semantically
> >>> invisible, have no impact on memory usage, and the ugly bit
> >>> manipulation can be abstracted away.  What do you think?  Any other
> >>> possibilities I missed?
> >>>
> >>>
> >>> Why I need an FTZ flag:  some ARM Neon vector instructions have FTZ
> >>> semantics, which means we can't vectorize instructions when compiling
> >>> for Neon unless we know the user is okay with FTZ.  Today we pretend
> >>> that the "fast" variant of FastMathFlags implies FTZ
> >>> (https://reviews.llvm.org/rL266363), which is not ideal.  Moreover
> >>> (this is the immediate reason), for XLA CPU I'm trying to generate FP
> >>> instructions without nonan and noinf, which breaks vectorization on
> >>> ARM Neon for this reason.  An explicit bit for FTZ will let me
> >>> generate FP operations tagged with FTZ and all fast math flags except
> >>> nonan and noinf, and still have them vectorize on Neon.
> >>>
> >>> -- Sanjoy
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>