[llvm-dev] RFC: Moving DAG heuristic-based transforms to MI passes

Sat Jan 28 07:21:30 PST 2017

On 01/28/2017 04:19 AM, Andrew V. Tischenko wrote:
> In fact to commit the change before dealing with worst-case 
> performance is a good idea because here we have 2 different issues. 
> But the main idea of this RFC is an attempt to show the better 
> approach to to these kinds of transformations and to suggest to use 
> this approach in the future.
>
> At the same time, I'm trying to explain that this patch is not the 
> performance one because the generated code is almost identical to what 
> we have just now. It is a suggestion to change the strategy in such 
> transformations elaborating. If the community accept this new strategy 
> we're ready to introduce new similar transformations, automate the 
> framework, etc. But of course it will be themes for new RFCs and 
> discussions.

I'm not sure that your RFC is detailed enough to really get feedback 
from those not already attuned to the situation. As someone who is, I'm 
in favor of moving forward with this. We're already using the 
MachineCombiner in many targets (AArch64, PowerPC, and already X86) to 
perform these kinds of transformations, and while we need work on the 
infrastructure in the near term (fixing MachineCombiner worst-case 
performance issues, TableGen integration, etc.), this is important for 
increasing code quality in cases where there are non-trivial 
latency/throughput tradeoffs.

  -Hal

>
> On 1/27/2017 11:56 PM, Hal Finkel wrote:
>> On 01/27/2017 10:30 AM, Andrew V. Tischenko via llvm-dev wrote:
>>
>>> All llvm-devs,
>>>
>>> We're going to introduce the new possible implementation for such 
>>> optimizations as reciprocal estimation instead of fdiv. In short 
>>> it's a replacement of fdiv instruction (which is very expensive in 
>>> most of CPUs) with alternative sequence of instructions which is 
>>> usually cheaper but has appropriate precision (see genReciprocalDiv 
>>> in lib/Target/X86/X86InstrInfo.cpp for details). There are other 
>>> similar optimizations like usage of rsqrt, etc. but at the moment 
>>> we're dealing with recip estimation only - see 
>>> https://reviews.llvm.org/D26855 for details.
>>>
>>> The current version of optimization is done at DAG Combiner level 
>>> when we don't know the exact target instructions which will be used 
>>> by CodeGen. As result we don't know the real cost of the alternative 
>>> sequence and can't compare that cost with the cost of the single 
>>> fdiv. As result the decision to select an alternative sequence (made 
>>> on compiler options only) could be wrong because modern CPUs 
>>> introduce very cheap fdiv and we should use it directly.
>>>
>>> We suggest to move the implementation from DAG heuristics to 
>>> MI-scheduler-based transformations (Machine Combiner). At that time 
>>> we know exact target instructions and are able to use 
>>> scheduler-based cost model. This knowledge allows as to select 
>>> proper code sequence for final target code generation.
>>>
>>> A possible disadvantage of the new implementation is compile time 
>>> increasing (as discussed in D26855), but we expect to make 
>>> improvements in that area. For the initial change (reciprocal 
>>> transform), any difference is limited to fast-math compilations.
>>>
>>> Any objections, suggestion, comments?
>>>
>>
>> Are you asking whether is okay to commit the change first and then 
>> look at the MachineCombiner's worst-case performance in followup? In 
>> general, I think that moving to using the MachineCombiner for these 
>> kinds of transformations, where there are complex tradeoffs between 
>> latency, throughput, etc., is the right direction.
>>
>>  -Hal
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory