[PATCH][instcombine]: Slice a big load in two loads when the element are next to each other in memory.

Tue Oct 8 09:30:10 PDT 2013

On Oct 7, 2013, at 10:39 PM, Owen Anderson <resistor at mac.com> wrote:

> 
> On Oct 7, 2013, at 4:04 PM, Quentin Colombet <qcolombet at apple.com> wrote:
> 
>> Hi Owen,
>> 
>> Thanks for the review.
>> See my comments inlined with yours.
>> 
>> On Oct 7, 2013, at 3:47 PM, Owen Anderson <resistor at mac.com> wrote:
>> 
>>> 
>>> On Oct 7, 2013, at 1:24 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>> 
>>>> Hi Owen, hi Chandler,
>>>> 
>>>> Owen, you may be a good match to review this patch, which is SDSel related and not instcombine related (as the subject says… it changes :)).
>>> 
>>> I’m not sure I like the cost model this represents.  It seems very x86-centric.  For example, some RISCs (PowerPC and AArch64, offhand) can do single-instruction paired-register bitfield extracts, which means that original input sequence was better for them (1 load + 1 extract, instead of 2 loads).
>> This shouldn’t happen unless the target specified that it supports paired load (the new target hook that this patch adds), which is false by default.
>> In that case, you end up with 1 paired load instead of 1 load + 1 extract.
>> 
>> Does it make sense?
> 
> The criteria for when a target should/should not implement the has-paired-load target hook.  How is it different from supporting vector loads?
To me, vector registers hold both values within the same register, whereas a paired load defines two different registers.
Assuming the vector registers and the regular registers are on the same bank, i.e., extracting both values from the vector register is free, it may be still interesting to use the paired load because it may be less constrained by register allocation.
Now, if the registers are not on the same bank, we may definitely not want to generate the vector load, but still be able to catch the paired load.

I am not sure if it is doable to check the same bank property without the sub register index in SelectionDAG. Assuming I can have this information, I may be able to generate a vector load when the registers are on the same bank.
However, in that case, I wonder if it wouldn’t have been better to do that as an IR pass. More over I am a bit reluctant to introduce vector code that late. See hereafter.

> 
>>> Even on x86, you’re probably best off trying to turn this into a <2 x i32> vector load + lane copy rather than splitting the load.
>> My plan here was to turn this into two split loads, then having another dag combine for merging those loads into a vector load. I though maybe target supporting paired load already do this kind of target specific combine.
> 
> You certainly can go that way, but it sounds challenging to implement.  My point was that it seems like you could achieve most of the same benefit with a DAG combine that turned this example into a <2 x float> vector load, with much less concern about the cost model, etc.
My concern then is that we may introduce cross register bank copies (e.g., NEON to regular register) and in that case, the new code would be more expensive.
My approach here was to not insert vector code. I believe the same approach was used in SROA: do not insert vector code, unless the code has already been vectorized.
Do we want to change that here?

-Quentin

> 
> --Owen
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131008/204db740/attachment.html>