[llvm-commits] Patch: Floating point optimizations for SimplifyInstruction

Fri Dec 7 16:18:30 PST 2012

On Dec 7, 2012, at 8:42 AM, Duncan Sands <baldrick at free.fr> wrote:

> Hi Michael,
> 
> On 06/12/12 01:34, Michael Ilseman wrote:
>> Here's a second round of patches, incorporating the review feedback Duncan had given me earlier but I had spaced. The biggest difference is an extension to PatternMatch.h for pattern matching ConstantFP values of interest.
>> 
>> Patch1: Pattern matching for more floating point values: -0, +/-0, 1.0, and ConstantFPs
>> 
>> Patch2: Added a slew of SimplifyInstruction floating-point optimizations, many of which take advantage of fast-math flags. Test cases included.
>>   fsub X, 0 ==> X
>>   fsub 0, (fsub 0, X) ==> X
>>   fsub X, -0 ==> X
>>     when X is known non-negative-zero, e.g. the fsub is nsz.
>>   fsub nnan ninf x, x ==> 0.0
>>   fadd X, -0 ==> X
>>   fadd X, 0 ==> X
>>     when X is known non-negative-zero, e.g. the fadd is nsz.
>>   fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
>>     where nnan and ninf have to occur at least once somewhere in this expression
>>   fmul X, 1.0 ==> X
> 
> 
>> --- a/lib/Analysis/InstructionSimplify.cpp
>> +++ b/lib/Analysis/InstructionSimplify.cpp
>> @@ -887,6 +887,82 @@ Value *llvm::SimplifySubInst(Value *Op0, Value *Op1, bool isNSW, bool isNUW,
>>                            RecursionLimit);
>> }
>> 
>> +/// Given operands for an FAdd, see if we can fold the result.  If not, this
>> +/// returns null.
>> +static Value *SimplifyFAddInst(Value *Op0, Value *Op1, FastMathFlags FMF,
>> +                              const Query &Q, unsigned MaxRecurse) {
>> +  if (Constant *CLHS = dyn_cast<Constant>(Op0)) {
>> +    if (Constant *CRHS = dyn_cast<Constant>(Op1)) {
>> +      Constant *Ops[] = { CLHS, CRHS };
>> +      return ConstantFoldInstOperands(Instruction::FAdd, CLHS->getType(),
>> +                                      Ops, Q.TD, Q.TLI);
>> +    }
>> +
>> +    // Canonicalize the constant to the RHS.
>> +    std::swap(Op0, Op1);
>> +  }
>> +
>> +  // fadd X, -0 ==> X
>> +  if (match(Op1, m_NegZero()))
>> +    return Op0;
>> +
>> +  // fadd X, 0 ==> X, when we know X is not -0
>> +  if (match(Op1, m_Zero()) &&
>> +      (FMF.NoSignedZeros || CannotBeNegativeZero(Op0)))
>> +    return Op0;
>> +
>> +  // fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
>> +  //   where nnan and ninf have to occur at least once somewhere in this
>> +  //   expression
>> +  Value *X = 0;
> 
> (Bike shed) You don't need to initialize X.  Actually you don't need X at all
> (see next comment).
> 
>> +  Value *SubOp = 0;
>> +  if (match(Op1, m_FSub(m_AnyZero(), m_Value(X))) && X == Op0)
> 
> Rather than
>  m_Value(X))) && X == Op0
> you can do
>  m_Specific(Op0)
> Same comment applies below too.
> 

Good catch!

>> +    SubOp = Op1;
>> +  else if (match(Op0, m_FSub(m_AnyZero(), m_Value(X))) && X == Op1)
>> +    SubOp = Op0;
>> +  if (SubOp) {
>> +    Instruction *FSub = cast<Instruction>(SubOp);
>> +    if ((FMF.NoNaNs || FSub->hasNoNaNs()) &&
>> +        (FMF.NoInfs || FSub->hasNoInfs()))
>> +      return Constant::getNullValue(Op0->getType());
>> +  }
>> +
>> +  return 0;
>> +}
>> +
>> +/// Given operands for an FSub, see if we can fold the result.  If not, this
>> +/// returns null.
>> +static Value *SimplifyFSubInst(Value *Op0, Value *Op1, FastMathFlags FMF,
>> +                              const Query &Q, unsigned MaxRecurse) {
>> +  if (Constant *CLHS = dyn_cast<Constant>(Op0)) {
>> +    if (Constant *CRHS = dyn_cast<Constant>(Op1)) {
>> +      Constant *Ops[] = { CLHS, CRHS };
>> +      return ConstantFoldInstOperands(Instruction::FSub, CLHS->getType(),
>> +                                      Ops, Q.TD, Q.TLI);
>> +    }
>> +  }
>> +
>> +  // fsub X, 0 ==> X
>> +  if (match(Op1, m_Zero()))
>> +    return Op0;
>> +
>> +  // fsub X, -0 ==> X, when we know X is not -0
>> +  if (match(Op1, m_NegZero()) &&
>> +      (FMF.NoSignedZeros || CannotBeNegativeZero(Op0)))
>> +    return Op0;
>> +
>> +  // fsub 0, (fsub 0, X) ==> X
>> +  Value *X = 0;
>> +  if (match(Op0, m_Zero()) && match(Op1, m_FSub(m_Zero(), m_Value(X))))
>> +    return X;
> 
> Does this also work for -0.0?  With fast-math it should work with mixes
> of +0.0 and -0.0 too.

I'm still wrapping my head around all the nuances of floating-point negative zero. I'm going to try to crawl through this for my own benefit, feel free to skip straight to the punchline :)
  1)  x - (-x) shall retain the same sign as x when the result is 0
  2)  x - x shall be positive when resulting 0 (default rounding mode)

So, for the inner (fsub 0, X), where X can be +/-0:
(+0 - (-0) ==> +0 (Rule #1)
(+0 - (+0) ==> +0 (Rule #2)
(-0 - (-0) ==> +0 (Rule #2)
(-0 - (+0) ==> -0 (Rule #2)

Let's enumerate the outcomes, where e.g. the (o, i) pattern means the sign of the outer and inner zero constants. The resulting sign should be the same as x's for the transformation to be valid:
x | (o, i) | Result
+ | (+, +) | +
- | (+, +) | +    <--- bad!
+ | (+, -) | +
- | (+, -) | +    <--- bad!
+ | (-, +) | -    <--- bad!
- | (-, +) | -
+ | (-, -) | +
- | (-, -) | -

So, from this it seems that the below is the correct pattern:
  fsub(anyZero, fsub(negZero, X)) ==> X

And of course the inner zero can be any zero when no-signed-zeros fast-math flag is set.

> 
> 
>> --- a/include/llvm/Support/PatternMatch.h
>> +++ b/include/llvm/Support/PatternMatch.h
>> @@ -87,8 +87,36 @@ struct match_zero {
>> /// m_Zero() - Match an arbitrary zero/null constant.  This includes
>> /// zero_initializer for vectors and ConstantPointerNull for pointers.
>> inline match_zero m_Zero() { return match_zero(); }
>> -
>> -
>> +
>> +struct match_neg_zero {
>> +  template<typename ITy>
>> +  bool match(ITy *V) {
>> +    if (const Constant *C = dyn_cast<Constant>(V))
>> +      return C->isNegativeZeroValue();
>> +    return false;
>> +  }
>> +};
>> +
>> +/// m_NegZero() - Match an arbitrary zero/null constant.  This includes
>> +/// zero_initializer for vectors and ConstantPointerNull for pointers. For
>> +/// floating point constants, this will match negative zero but not positive
>> +/// zero
>> +inline match_neg_zero m_NegZero() { return match_neg_zero(); }
>> +
>> +struct match_any_zero {
>> +  template<typename ITy>
>> +  bool match(ITy *V) {
>> +    if (const Constant *C = dyn_cast<Constant>(V))
>> +      return C->isNullValue() || C->isNegativeZeroValue();
>> +    return false;
>> +  }
>> +};
>> +
>> +/// m_AnyZero() - Match an arbitrary zero/null constant.  This includes
>> +/// zero_initializer for vectors and ConstantPointerNull for pointers. For
>> +/// floating point constants, this will match negative zero and positive zero
>> +inline match_any_zero m_AnyZero() { return match_any_zero(); }
>> +
> 
> While I think m_AnyZero is OK as it is, it could probably have been implemented
> as some kind of m_Or(m_Zero(...), m_NegZero(...)) thing.
> 

Yeah, unfortunately m_Or is for matching an LLVM Or instruction, and not an 'or' pattern combinator. I'll keep this for now, and add in simple pattern combinators to my TODO list.

>> struct apint_match {
>>   const APInt *&Res;
>>   apint_match(const APInt *&R) : Res(R) {}
>> @@ -161,7 +189,27 @@ struct cst_pred_ty : public Predicate {
>>     return false;
>>   }
>> };
>> -
>> +
>> +/// cst_fp_pred_ty - This helper class is used to match scalar and vector
>> +/// floating point constants that satisfy a specified predicate.
>> +template<typename Predicate>
>> +struct cst_fp_pred_ty : public Predicate {
>> +  template<typename ITy>
>> +  bool match(ITy *V) {
>> +    if (const ConstantFP *CFP = dyn_cast<ConstantFP>(V))
>> +      return this->isValue(CFP->getValueAPF());
>> +    // FIXME: Remove this.
> 
> Remove this?
> 

I'd be happy to. Why is that there for the other patterns? I didn't want to muck around with stuff that I didn't understand.

>> +    if (const ConstantVector *CV = dyn_cast<ConstantVector>(V))
>> +      if (ConstantFP *CFP = dyn_cast_or_null<ConstantFP>(CV->getSplatValue()))
>> +        return this->isValue(CFP->getValueAPF());
>> +    if (const ConstantDataVector *CV = dyn_cast<ConstantDataVector>(V))
>> +      if (ConstantFP *CFP = dyn_cast_or_null<ConstantFP>(CV->getSplatValue()))
>> +        return this->isValue(CFP->getValueAPF());
>> +    return false;
>> +  }
>> +};
>> +
>> +
>> /// api_pred_ty - This helper class is used to match scalar and vector constants
>> /// that satisfy a specified predicate, and bind them to an APInt.
>> template<typename Predicate>
>> @@ -198,12 +246,22 @@ struct api_pred_ty : public Predicate {
>> 
>> struct is_one {
>>   bool isValue(const APInt &C) { return C == 1; }
>> +  bool isValue(const APFloat &C) {
>> +    bool ignored;
>> +    APFloat F(1.0);
>> +    F.convert(C.getSemantics(), APFloat::rmNearestTiesToEven, &ignored);
>> +    return C.bitwiseIsEqual(F);
>> +  }
> 
> This seems to be doing some heavy operations, is there no cheaper trick?
> 

I've been trying to find a better way, but unfortunately I haven't. APFloat is built around performing operations with consistent fp_semantics, and the way to construct a specific value is to call convert, unless you happen to be float or double. Since I'm comparing against a statically known value (1.0), maybe I can precompute the "integer" part and construct the APFloat directly. This feels dirty and hard to understand, but would be faster. I could also fast path float and doubles, and leave convert for half, 80-bit float, etc.

I'm also considering doing an arbitrary value comparison, and not just 1.0. For that, fast path is the only thing I can think of to help.

> Otherwise LGTM.
> 
> Ciao, Duncan.

Thanks for the feedback!

> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits