[PATCH] D12882: [SimplifyCFG] do not speculate fdiv by default; it's expensive (PR24818)

Tue Sep 22 16:46:03 PDT 2015


On 09/17/2015 08:20 AM, Sanjay Patel wrote:
> spatel added a comment.
>
> In http://reviews.llvm.org/D12882#247604, @llvm-commits wrote:
>
>> Can you give some context on specifically which speculative
>>   optimizations you're hoping to avoid here?  Are you only worried about
>>   if-then-else-to-select from SimplifyCFG?  What about cases like LICM?
>>   Do you think those should hoist or not?
>
> I was only considering the if-then case in SimplifyCFG. I'm not familiar with LICM, but I don't see it using TTI. Maybe that's just an oversight though?
My impression is that our heuristics around when to speculate are fairly 
haphazard.  Even just within SimplifyCFG, we appear to have several 
different policies.  And don't even get me started on the sinking done 
by InstCombine...
>
>> My general feeling is that we *should* speculate these instructions if
>>   is legal and we have reason to predict it as being profitable (e.g. in
>>   LICM).  We do not currently have a global scheduler (or even a global
>>   schedule fixer-uper beyond some hacks in CodeGenPrep), but I've been
>>   thinking about this for other reasons.  In cases where the only use is
>>   down a particular path, adding a "fixup" which undoes hoisting seems
>>   plausible and reasonable.
>> For this particular case, it really seems like this should be handled
>>   via a select-to-control flow conversion for expensive operations done
>>   late in the pipeline.  IMO, selects is a better intermediate form for
>>   optimization, and we should undo the transformation late when heading
>>   into the backend.  In fact, we already have the framework for this in
>>   CodeGenPrep::OptimizeSelectInst.  Have you considered tweaking that instead?
>
> I was hoping there was some later hook in place where we could do this, but I hadn't looked further yet, so I'm glad to see that CGP has it.
>
> But if that is the right spot to do (undo) this transform, that implies that using TTI here in SimplifyCFG was the wrong choice, doesn't it? Ie, the TTI cost model cites this exact case as a reason for its existence:
>
>    TCC_Expensive = 4 ///< The cost of a 'div' instruction on x86.
>
> If we're not going to honor the TTI cost model here in SimplifyCFG, we should go back to whatever logic was used before r228826. (Note: I'm not advocating one way or the other...yet; I'm just looking for the right answer.)
That would be my temptation, but without evidence and time to look at 
this in detail, I'm leery of proposing that seriously.

(As Hal said, the patch currently under discussion can go in.  It works 
within the line of existing code; if we were to revisit that, we can do 
everything at once.)
>
> This is another way of asking the general question I've run into several times lately ( PR24818 <https://llvm.org/bugs/show_bug.cgi?id=24818>, PR24766 <https://llvm.org/bugs/show_bug.cgi?id=24766>, PR22723 <https://llvm.org/bugs/show_bug.cgi?id=22723>, PR24743 <https://llvm.org/bugs/show_bug.cgi?id=24743> ):
I've been running into this a lot as well.  In particular, I've been 
seeing cases where removing data dependencies in the IR result in 
inferior scheduling code generation and performance due to our 
scheduling heuristics.  I haven't yet reached the point where a) I have 
enough time and b) I feel like I understand the problem well enough, to 
do anything about it.
> What defines the "canonical" or "optimal" IR? Less instructions? Less basic blocks? Less register pressure? Is it art or science?
Hal answered this fairly well.
>
>
> http://reviews.llvm.org/D12882
>
>
>