[PATCH][instcombine]: Slice a big load in two loads when the element are next to each other in memory.

Thu Sep 19 22:49:39 PDT 2013

Hi,

Here is the new patch.
The optimization is now done as part of DAG combine.

The patch looks pretty much the same as its InstCombine counter part (SDNode, instead of Instruction) except that it introduces a new profitability model.
Hence, when the transformation was done as InstCombine the idea  was to perform cannonicalization. Now, this is an optimization.

For this purpose, the patch introduces a cost model (represented by the helper class LoadedSlice::Cost), which accounts for the instructions that are saved by slicing a load compared to the original big load.
It also introduces a new target hook, to query about load pairing capabilities (type and required alignment).
Basically, this cost model counts how many truncates, shifts, etc, are saved with slicing compared to the number of introduced loads.
It takes advantages of information like isTruncateFree, etc.

Currently the cost model is used in a very specific case: only two slices that are next to each other in memory.
This limitation can be lifted as soon as we consider the cost model mature enough and when machine passes that deal with load pairing manage to handle more slices.

Thanks for your reviews!

-Quentin

On Sep 17, 2013, at 3:05 PM, Quentin Colombet <qcolombet at apple.com> wrote:

> After discussing with Chandler off-line, we decided to perform the load slicing at isel time.
> Thus, this patch has been reverted in r190891.
> 
> -Quentin
> 
> On Sep 17, 2013, at 2:10 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
>> 
>> On Mon, Aug 26, 2013 at 1:08 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> After looking into SROA, Nadav and I agreed that it does the right thing.
>> Therefore, the initial proposed patch is still the one to be reviewed.
>> 
>> Very, very sorry for the late comment here. I misunderstood this comment and the result.
>> 
>> I agree that SROA is correct here, and I also think that this should be the canonical form. With the LLVM memory model, it is very hard to merge two smaller loads back together if that is ever profitable. It is essentially impossible in many cases to merge two smaller stores back together if that is ever profitable. As such, it is very useful to preserve the widest known-safe load and store size as far as possible in the optimizer. At least getting it to the backend where the cost factor for various load and store operations is known is essential.
>> 
>> Can we canonicalize toward the wide loads and stores with appropriate masking to extract narrow values, and then match these back to the small stores in the backend?
>> 
>> Both SROA and optimal bitfield code generation rely on this (at least for x86) so changing it will regress some things.
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130919/f657f371/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAGCombineLoadSlicing.svndiff
Type: application/octet-stream
Size: 31759 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130919/f657f371/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130919/f657f371/attachment-0001.html>