[PATCH][DAGCombiner] insert_vector_elt: Avoid to create multiple uses of the same load.

Fri Jul 26 16:29:45 PDT 2013

Hi Tom,

The attached patch for R600 should do the trick to avoid the regression.

If you agree, I will commit it right after the first proposed patch has landed.

Cheers,

-Quentin

On Jul 26, 2013, at 12:37 PM, Quentin Colombet <qcolombet at apple.com> wrote:

> On Jul 26, 2013, at 11:39 AM, Tom Stellard <tom at stellard.net> wrote:
> 
>> On Thu, Jul 25, 2013 at 05:31:15PM -0700, Quentin Colombet wrote:
>>> Hi,
>>> 
>>> After discussing with Nadav offline, we decide to block the optimization when the input vector is used more than once. Indeed, building a vector may be expensive and we think it should be profitable to reuse part of an existing vector instead of creating a new one.
>>> 
>>> I have updated a bunch of tests to reflect this. We went through the test cases that were regressing and fixed them:
>>> - Dead-code: CodeGen/X86/vshift-1.ll to CodeGen/X86/vshift-4.ll.
>>> - Use of insertps instead of pshufd, which is good because pshufd is expensive: test/CodeGen/X86/fold-load-vec.ll.
>>> 
>>> It remains one “regression”:
>>> - Tons of move inserted: test/CodeGen/R600/swizzle-export.ll.
>>> 
>>> I have marked this test as XFAIL at this point.
>>> 
>>> Tom, I believe the lowering of R600 can be improved here. One easy fix could be to replicate the old behavior of DAGCombine on insert_vector_elt nodes in a target specific dag combine if you think it is the best lowering for you.
>>> If you have other suggestions, let me know.
>>> 
>> 
>> On R600, build_vector is lowered to REG_SEQUENCE, which is something the
>> register allocator does a good job at optimizing.  INSERT_VECTOR
>> instructions tend to be a problem, because the DAG Scheduler considers
>> writes to different lanes of the same vector to be a write conflict,
>> which means they can't be scheduled in the same VLIW packet.  Vincent
>> was working on a patch to fix this, but I'm not sure how much it will
>> help in this particular situation.
>> 
>> I think the best think to do for R600 would be to try to replicate the
>> old functionality in a target-specific combine.
> Based on your comment, I agree.
> 
>>  Is this something you
>> can do easily?
> That should be fairly easy.
> I will see what I can do.
> 
> -Quentin
> 
>> If not, I can give it a shot.
>> 
>> -Tom
>> 
>> 
>>> Thanks for your reviews.
>>> 
>>> Cheers,
>>> 
>>> -Quentin
>>> 
>>> 
>>> On Jul 25, 2013, at 11:15 AM, Quentin Colombet <qcolombet at apple.com> wrote:
>>> 
>>>> Hi Nadav,
>>>> 
>>>> Thanks for the feedback.
>>>> 
>>>> Well, I did not want to go in that direction because I think that the instruction selection of all LLVM targets knows how to deal efficiently with build_vector nodes.
>>>> Moreover, in some cases, the BUILD_VECTOR we want to optimize has several uses but the final vector has not.
>>>> E.g.,
>>>> V1 = insert_vector undef, cst, 0 // create a new vector from a constant.
>>>> V2 = insert_vector V1, b, 1    // create a new vector V1 | b
>>>> V3 = insert_vector V1, c, 1    // create a new vector V1 | c
>>>> 
>>>> With current behavior:
>>>> V2 = build_vector cst, b    // create a new vector cst | b
>>>> V3 = build_vector cst, c    // create a new vector cst | c
>>>> 
>>>> With proposed behavior:
>>>> V1 = insert_vector undef, cst, 0 // create a new vector from a constant.
>>>> V2 = build_vector V1, b    // create a new vector V1 | b
>>>> V3 = build_vector V1, c    // create a new vector V1 | c
>>>> 
>>>> I do not know if the code resulting from what you proposed will cause trouble but it is more likely to impact all the target than what I have proposed.
>>>> 
>>>> An alternative approach could be to do something between what we both proposed, in other words, checking if the build_vector we optimize has several uses and if yes, checking if the uses that we are about to duplicate are not constant.
>>>> However, I am not sure it is a good idea to allow constant duplication and prevent uses duplication generally.
>>>> Again, I think, but I may be wrong, most back-ends are optimized to deal with build_vector properly. The load case I addressed in that patch is something, I believe, that affects all targets.
>>>> 
>>>> What do you think?
>>>> 
>>>> Cheers,
>>>> 
>>>> -Quentin
>>>> 
>>>> On Jul 24, 2013, at 10:16 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>>> 
>>>>> Hi Quentin, 
>>>>> 
>>>>> Thanks for working on this. I think that the problem of folding the load is secondary compared to the problem of inefficient insert sequence because of this optimization.  In your example below the code for creating V2 and V3 is inefficient because you need to build the vector twice (think about 8-wide vectors), even if “a" is not a load.  I would solve this problem by checking if the BUILD_VECTOR that we want to optimize hasOneUse(). 
>>>>> 
>>>>> Thanks,
>>>>> Nadav
>>>>> 
>>>>> On Jul 24, 2013, at 5:35 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Here is a patch that prevents DAGCombiner to combine insert_vector_elt and build_vector nodes into build_vector nodes when it creates multiple uses of the same load.
>>>>>> 
>>>>>> 
>>>>>> ** Context **
>>>>>> 
>>>>>> Prior to this patch DAGCombine combines insert_vector_elt nodes with constant indexes and build_vector nodes into build_vector nodes.
>>>>>> I.e.,
>>>>>> insert_vector_elt (build_vector elt0, …, eltN), NewEltIdx, idx
>>>>>> =>
>>>>>> build_vector elt0, …, NewEltIdx, …, eltN
>>>>>> 
>>>>>> This has the advantage of flattening the DAG and making the vectors more obvious.
>>>>>> However, if both build_vector nodes are meant to exist side by side, elt0 to eltN now have two users instead of one.
>>>>>> 
>>>>>> If one of this element is a load, this will prevent all folding opportunities.
>>>>>> 
>>>>>> 
>>>>>> ** Example **
>>>>>> 
>>>>>> a = load
>>>>>> b = load
>>>>>> c = load
>>>>>> V1 = insert_vector undef, a, 0 // create a new vector form a
>>>>>> V2 = insert_vector V1, b, 1    // create a new vector V1 | b
>>>>>> V3 = insert_vector V1, c, 1    // create a new vector V1 | c
>>>>>> 
>>>>>> During DAG combine, this is optimized into:
>>>>>> a = load
>>>>>> b = load
>>>>>> c = load
>>>>>> V2 = build_vector a, b    // create a new vector a | b
>>>>>> V3 = build_vector a, c    // create a new vector a | c
>>>>>> 
>>>>>> Now the load of ‘a’ has two users. This is what it is exposed in the test case of the patch.
>>>>>> 
>>>>>> 
>>>>>> ** Proposed Solution **
>>>>>> 
>>>>>> When combining an insert_vector_elt with a build_vector, we check if the vectors (the one issued from the insert_vector_elt and the one issued from the initial build_vector) are meant to exist side by side.
>>>>>> If they are, we check all the sources that will be duplicated from the input vector (i.e., all but eltIdx in the argument of the input build_vector) and when one of them is a load then do not combine.
>>>>>> 
>>>>>> To sum up:
>>>>>> - check if the resulting vector will live with the input vector.
>>>>>> - if yes, check the arguments of the input vector.
>>>>>>   - if one of the argument is a load used only once: do not combine as it will create an unfoldable load.
>>>>>> 
>>>>>> 
>>>>>> Thanks for the reviews.
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> -Quentin
>>>>>> <InsertVectorCombine.svndiff>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130726/d07f9085/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: R600Lowering.svndiff
Type: application/octet-stream
Size: 2609 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130726/d07f9085/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130726/d07f9085/attachment-0001.html>