[PATCH] D19178: Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type

Fri Jul 29 12:05:52 PDT 2016

aarzee added inline comments.

================
Comment at: lib/Transforms/InstCombine/InstCombineCasts.cpp:1496
@@ +1495,3 @@
+                MostSignificantPossiblySetBit = std::max(HighestNotKnownZero,
+                                                         LowestNotKnownOne);
+            }
----------------
eli.friedman wrote:
> aarzee wrote:
> > eli.friedman wrote:
> > > aarzee wrote:
> > > > eli.friedman wrote:
> > > > > aarzee wrote:
> > > > > > eli.friedman wrote:
> > > > > > > aarzee wrote:
> > > > > > > > eli.friedman wrote:
> > > > > > > > > aarzee wrote:
> > > > > > > > > > eli.friedman wrote:
> > > > > > > > > > > If a value has unknown sign, isn't KnownZero.countLeadingOnes() always going to zero?
> > > > > > > > > > I believe that's correct, yes. Thanks for catching that.
> > > > > > > > > > 
> > > > > > > > > > This means:
> > > > > > > > > >   - The absolute value of an integer with unknown sign and bitwidth X has a maximum bitwidth X;
> > > > > > > > > >   - We know that we already checked whether the width of the integer fits into the significand earlier in the function, and if we reached this point in the function that comparison was false;
> > > > > > > > > >   - So if we reach this point in the function with unknown sign, we can immediately return nullptr.
> > > > > > > > > > 
> > > > > > > > > > We can add this line after the definitions of KnownNegative and PossiblyNegative, and remove the respective else blocks later in the function:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > if (PossiblyNegative && !KnownNegative)
> > > > > > > > > >   // If the sign of the number is unknown, the bit range will be the width of the integer type,
> > > > > > > > > >   // and we already checked whether that will fit into the fp type earlier in the function, so
> > > > > > > > > >   // we can exit early.
> > > > > > > > > >   return nullptr;
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > I'll do this in the next revision.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > I think you can still optimize based on trailing zero bits even if the sign isn't known.
> > > > > > > > Again, you're right. I wasn't thinking about the least significant possibly set bit. How about this:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > // If the number is known to be negative, the highest possibly set bit
> > > > > > > > // is the lowest bit not known to be one.
> > > > > > > > // Otherwise, it's the highest bit not known to be zero.
> > > > > > > > 
> > > > > > > > if (KnownNegative)
> > > > > > > >   MostSignificantPossiblySetBit = int(KnownOne.countTrailingOnes()); 
> > > > > > > > else
> > > > > > > >   MostSignificantPossiblySetBit = int(BitWidth - KnownZero.countLeadingOnes()) - 1;
> > > > > > > > ```
> > > > > > > I assume you meant to use countLeadingOnes for both?  Looks roughly right.
> > > > > > Nope. Lowest bit not known to be one is KnownOne.countTrailingOnes, and highest bit not known to be zero is BitWidth - KnownZero.countLeadingOnes() - 1.
> > > > > What do you need the lowest bit not known to be one for?  If the input is known to be an even number, that will always be zero.  (If your testing script didn't catch this, you need a more thorough testing script.)
> > > > You're right again. This is pretty intellectually trying for me, so apologies for those errors.
> > > > 
> > > > The problem with the testing script is that it is currently exhaustive and runs in exponential time, so running it with floats and 32-bit integers is impractical, and I have no hardware that supports half-precision.
> > > Can you use some sort of software half-precision floating-point library?  numpy has a "float16" type which should support the relevant conversion operations.
> > I've tried it and it's still impractical at ~43 million tests.
> Brute-forcing 43 million tests in an hour on a single-core 2GHz desktop computer comes out to around 200,000 cycles per test.  If you're trying to run `opt -instcombine` 43 million times, that might be a problem, but you should be able to easily fit into a 200,000 cycle budget if you copy-paste the computation from this patch into a C++ test harness.  (That wouldn't be a complete end-to-end test, but it would be enough to show the tricky bits work correctly.)
I managed to pull out the relevant code and call it from Python, and it is significantly faster. So now running 43m tests shouldn't be an issue. And indeed, the lack of a specific condition for unknown sign led to issues. So I added a condition that if the sign is unknown, the highest possibly set bit is the bit to the right of the sign. Small caveat here: if the sign bit is unknown and the rest of the bits are all known zero, the highest possibly set bit will be one to the right of the sign bit, while the lowest possibly set bit is the sign bit, leading to a -1 range. It's not necessary to correct this to 1, because 1 will always fit into any significand, and -1 is less than any significand.

https://reviews.llvm.org/D19178