[llvm] r207301 - [X86] Implement TargetLowering::getScalingFactorCost hook.

Mon Apr 28 05:48:52 PDT 2014

On 4/27/2014 7:09 AM, Hal Finkel wrote:
>
> How do you propose we change this? We often already check operation legality, etc., and I agree this is fairly crude, but I don't recall anyone proposing a more-sophisticated framework.

My (crude?) idea would be to perform target-specific combining first, 
and if nothing new was generated, then perform the generic one.  Or, if 
not that, then perform the generic one always, but after the 
target-specific.
In my case, if it is not so much for legality, but for catching cases 
that can easily be obscured by the generic transformations. Recently I 
added generating "bit extract" instructions for our target (just before 
someone worked on it in the ToT), and the problem I was encountering was 
that various shifts were combined together into one, and the "extract" 
pattern was not visible to us (at least at the time when we could 
examine the graph).  From time to time we run into situations like this 
where we want certain patterns to remain in a certain form, but the 
target-independent combining changes it.
For the "extract" problem I ended up adding a pre-lowering pass that 
matched the patterns directly on the IR, and generated builtins for the 
extract instructions.

>> Currently, the target-specific combining only happens
>> when the generic one (visit) does nothing.
>
> This is a separate problem, and I would certainly be in favor of a patch that fixes it!

To me they were really two aspects of the same problem: insufficient 
target-specific control over what happens to the DAG.  It is probably a 
crude approach in the sense that the generic transformations would still 
be done without any additional guidance from the targets.  We even have 
our own case for it---narrowing of loads.  Hexagon has separate load 
instructions to access data from small data sections, and due to various 
encoding/data-layout considerations, they cannot be narrowed.  This 
would be a good example of a generic optimization that would be 
"parametrized" by the target-specific input.

The problem in coming up with an elegant solution is knowing what needs 
to be controllable by the targets.  Given that there can be an infinite 
variety of architectures, the simplest approach would be to just let the 
target do its own work first, and then proceed with the 
target-independent optimizations.

My experiences with this are mostly from the point of view of Hexagon, 
and I'd be interested in what issues (if any) other architectures are 
running into.

You stated that the second part is a different problem---what did you 
have in mind?

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation