[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Mon Dec 17 16:49:32 PST 2012

On Mon, Dec 17, 2012 at 4:34 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
> On 12/17/12 3:49 PM, Eli Friedman wrote:
>>
>> On Fri, Dec 14, 2012 at 10:45 AM, Shuxin Yang <shuxin.llvm at gmail.com>
>> wrote:
>>>
>>>    I'd like to resurrect my previous change. See the attachment. It was
>>> revised per
>>> your requests. In addition to that, I change the Coeffficient class a bit
>>> such
>>> that the cost of default  constructor is merely a 4-byte-store-zero. This
>>> cost
>>> renders it possible to "abuse" this data structure.
>>>
>>>    As to the determinstic sorting, I have not yet figure out a cost
>>> effective
>>> way
>>> for that matter. Note that stable_sort() won't help as all Addends
>>> sharing
>>> the same symbolic value will be folded, we don't care their relative
>>> posittion
>>> before folding.
>>
>> We discussed this in person; it sounded like you have a viable solution.
>
> Yes
>
>
>
>> +    bool IsFp;
>> +
>> +    // True iff FpValBuf contains a instance of APFloat.
>> +    bool BufHasFpVal;
>>
>> How are these different?
>
> "BufHasFPVal" indicate that char-buff is not an array of garbage, there is a
> APFloat instance over there,
> it needs to be destructed finally.
>
> Once BufHasFPVal is set, it won't be reset.
>
> IsFP could be flipped back and forth.

OK.

>> It doesn't look like you addressed my earlier comment about
>> FAddend::drillDownOneStep.
>
> Sorry. I will change the name.
>
>> +    Value *V0 = I->getOperand(0);
>> +    Value *V1 = I->getOperand(1);
>> +    InstQuota = ((!isa<Constant>(V0) && V0->hasOneUse()) &&
>> +                 (!isa<Constant>(V1) && V1->hasOneUse())) ? 2 : 1;
>>
>> I'm not sure what your logic is here; can you add a comment to explain?
>
> Suppose the input instruction sequence is:
>
>   V0 = ...
>   V1 = ...
>   this-instruction-to-be-simplified = V0 +/- V1 (or other variant).
>
> The output instruction sequence is:
>    New1 =
>    New2 = ..
>
> If the output is better than the input, it should save at least one
> instruction.
> In light of that rule, if both V0 and V1 have multiple use, the output
> sequence should have
> no more than one instruction.
>
> If neither V0 nor V1 has multi-use, we can afford two instructions in the
> simplfied instruction sequence.
>
> If only of the V0 and V1 has multi-use,  the quota is one.
>
> If both V0 and V1 have multi-uses. the quota is one. In this case, we may
> not save instruction,
> but it doesn't make things worse.
>
> e.g.
>     V0 = x + y (mult-use)
>     V1 = x - y  (multi-use)
>    if "this instruction" is "R = V0 - V1", the quota for the output
> instruction is 1 instruction. So, we
> get the result "Y + Y", which doesn't reduce the overall instruction #, it
> does increase the # of instruction
> either.

By the same logic, why isn't the quota 3 for the case where nothing is
multi-use?  Or are you concerned about the transformation falling into
an infinite loop?

>> +unsigned FAddend::drillDownOneStep
>> +  (Value *Val, FAddend &Addend0, FAddend &Addend1) {
>> +  Instruction *I = 0;
>> +  if (Val == 0 || !(I = dyn_cast<Instruction>(Val)))
>> +    return 0;
>>
>> I'm not sure what you're trying to do with this check.
>>
>> +  // step 4: Try to optimize Opnd0 + Opnd1_0 [+ Opnd1_1]
>> +  if (Opnd1_ExpNum) {
>> +    AddendVect AllOpnds;
>> +    AllOpnds.push_back(&Opnd0);
>> +    AllOpnds.push_back(&Opnd1_0);
>> +    if (Opnd1_ExpNum == 2)
>> +      AllOpnds.push_back(&Opnd1_1);
>> +
>> +    if (Value *R = simplifyFAdd(AllOpnds, 1))
>> +      return R;
>> +  }
>>
>> This seems redundant: why wouldn't it be handled in "step 3"?  Maybe
>> the way drillDownOneStep works should be changed a little?
>>
>
> Nope.
> Consider instruction sequence:
>    V0 = X + Y
>    V1 = m - V0
>    V2  = V1 + V0
>
>  Step 3 tries to optimize : X + Y + m + (-1)*V0, which lead to nothing.
>  Step 4 tires to optimize  V0       + m + (-1)*V0, and get simplified expr
> "m".

Ah.

-Eli