[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Mon Dec 17 16:34:00 PST 2012

On 12/17/12 3:49 PM, Eli Friedman wrote:
> On Fri, Dec 14, 2012 at 10:45 AM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>    I'd like to resurrect my previous change. See the attachment. It was
>> revised per
>> your requests. In addition to that, I change the Coeffficient class a bit
>> such
>> that the cost of default  constructor is merely a 4-byte-store-zero. This
>> cost
>> renders it possible to "abuse" this data structure.
>>
>>    As to the determinstic sorting, I have not yet figure out a cost effective
>> way
>> for that matter. Note that stable_sort() won't help as all Addends sharing
>> the same symbolic value will be folded, we don't care their relative
>> posittion
>> before folding.
> We discussed this in person; it sounded like you have a viable solution.
Yes

> +    bool IsFp;
> +
> +    // True iff FpValBuf contains a instance of APFloat.
> +    bool BufHasFpVal;
>
> How are these different?
"BufHasFPVal" indicate that char-buff is not an array of garbage, there 
is a APFloat instance over there,
it needs to be destructed finally.

Once BufHasFPVal is set, it won't be reset.

IsFP could be flipped back and forth.

>
> It doesn't look like you addressed my earlier comment about
> FAddend::drillDownOneStep.
Sorry. I will change the name.
> +    Value *V0 = I->getOperand(0);
> +    Value *V1 = I->getOperand(1);
> +    InstQuota = ((!isa<Constant>(V0) && V0->hasOneUse()) &&
> +                 (!isa<Constant>(V1) && V1->hasOneUse())) ? 2 : 1;
>
> I'm not sure what your logic is here; can you add a comment to explain?
Suppose the input instruction sequence is:

   V0 = ...
   V1 = ...
   this-instruction-to-be-simplified = V0 +/- V1 (or other variant).

The output instruction sequence is:
    New1 =
    New2 = ..

If the output is better than the input, it should save at least one 
instruction.
In light of that rule, if both V0 and V1 have multiple use, the output 
sequence should have
no more than one instruction.

If neither V0 nor V1 has multi-use, we can afford two instructions in 
the simplfied instruction sequence.

If only of the V0 and V1 has multi-use,  the quota is one.

If both V0 and V1 have multi-uses. the quota is one. In this case, we 
may not save instruction,
but it doesn't make things worse.

e.g.
     V0 = x + y (mult-use)
     V1 = x - y  (multi-use)
    if "this instruction" is "R = V0 - V1", the quota for the output 
instruction is 1 instruction. So, we
get the result "Y + Y", which doesn't reduce the overall instruction #, 
it does increase the # of instruction
either.

>
> +unsigned FAddend::drillDownOneStep
> +  (Value *Val, FAddend &Addend0, FAddend &Addend1) {
> +  Instruction *I = 0;
> +  if (Val == 0 || !(I = dyn_cast<Instruction>(Val)))
> +    return 0;
>
> I'm not sure what you're trying to do with this check.
>
> +  // step 4: Try to optimize Opnd0 + Opnd1_0 [+ Opnd1_1]
> +  if (Opnd1_ExpNum) {
> +    AddendVect AllOpnds;
> +    AllOpnds.push_back(&Opnd0);
> +    AllOpnds.push_back(&Opnd1_0);
> +    if (Opnd1_ExpNum == 2)
> +      AllOpnds.push_back(&Opnd1_1);
> +
> +    if (Value *R = simplifyFAdd(AllOpnds, 1))
> +      return R;
> +  }
>
> This seems redundant: why wouldn't it be handled in "step 3"?  Maybe
> the way drillDownOneStep works should be changed a little?
>

Nope.
Consider instruction sequence:
    V0 = X + Y
    V1 = m - V0
    V2  = V1 + V0

  Step 3 tries to optimize : X + Y + m + (-1)*V0, which lead to nothing.
  Step 4 tires to optimize  V0       + m + (-1)*V0, and get simplified 
expr "m".