[llvm-dev] [RFC] Making space for a flush-to-zero flag in FastMathFlags

Sat Mar 16 12:51:25 PDT 2019

Hi,

I need to add a flush-denormals-to-zero (FTZ) flag to FastMathFlags,
but  we've already used up the 7 bits available in
Value::SubclassOptionalData (the "backing storage" for
FPMathOperator::getFastMathFlags()).  These are the possibilities I
can think of:

1. Increase the size of FPMathOperator.  This gives us some additional
bits for FTZ and other fastmath flags we'd want to add in the future.
Obvious downside is that it increases LLVM's memory footprint.

2. Steal some low bits from pointers already present in Value and
expose them as part of SubclassOptionalData.  We can at least steal 3
bits from the first two words in Value which are both pointers.  The
LSB of the first pointer needs to be 0, otherwise we could steal 4
bits.

3. Allow only specific combinations in FastMathFlags.  In practice, I
don't think folks are equally interested in all the 2^N combinations
present in FastMathFlags, so we could compromise and allow only the
most "typical" 2^7 combinations (e.g. we could nonan and noinf into a
single bit, under the assumption that users want to enable-disable
them as a unit).  I'm unsure if establishing the most typical 2^7
combinations will be straightforward though.

4. Function level attributes.  Instead of wasting precious
instruction-level space, we could move all FP math attributes on the
containing function.  I'm not sure if this will work for all frontends
and it also raises annoying tradeoffs around inlining and other
inter-procedural passes.

My gut feeling is to go with (2).  It should be semantically
invisible, have no impact on memory usage, and the ugly bit
manipulation can be abstracted away.  What do you think?  Any other
possibilities I missed?

Why I need an FTZ flag:  some ARM Neon vector instructions have FTZ
semantics, which means we can't vectorize instructions when compiling
for Neon unless we know the user is okay with FTZ.  Today we pretend
that the "fast" variant of FastMathFlags implies FTZ
(https://reviews.llvm.org/rL266363), which is not ideal.  Moreover
(this is the immediate reason), for XLA CPU I'm trying to generate FP
instructions without nonan and noinf, which breaks vectorization on
ARM Neon for this reason.  An explicit bit for FTZ will let me
generate FP operations tagged with FTZ and all fast math flags except
nonan and noinf, and still have them vectorize on Neon.

-- Sanjoy