[PATCH] D31710: [AMDGPU] Fix for issue in alloca to vector promotion pass

Mon Apr 10 11:05:36 PDT 2017

dstuttard added a comment.

In https://reviews.llvm.org/D31710#720328, @arsenm wrote:

> In https://reviews.llvm.org/D31710#719896, @dstuttard wrote:
>
> > As you say, the issue actually is that non-canonical IR was being presented to the pass - the front-end has since been updated to fix this (although possibly not in the way it should).
>
>
> The frontend emitting it seems fine if convenient, but instcombine will decompose it for you. If you pass pipeline somehow isn't running instcombine, you have a problem.

Yes. This is now fixed.

>> The question is, what should this pass do when valid, but non-canonical IR is passed in. My inclination is to strip out the code to transform to vector in this case and instead just more gracefully handle instead (the tryPromoteAlloca function will just return false - as with other cases it can't handle).
>> 
>> Not sure what you mean with your last comment
>> 
>>> Similarly we give up on array allocas in the form where they are allocating N items of the element type.
>> 
>> If you mean that the tryPromoteAlloca gives up for N<2 or N>4 as an example of what to do - then yes, that's what I'm proposing for this case as well.
> 
> No, I mean the I.isArrayAllocation() early exit. An alloca with a constant number of elements is turned into an alloca of 1 element of array type by instcombine.
> 
> FYI Changpeng has also been looking at this pass to handle more cases.

Ok - I see what you mean. Hopefully this minor fix will not be an issue for Changpeng.

> 
> 
> In https://reviews.llvm.org/D31710#719896, @dstuttard wrote:
> 
>> As you say, the issue actually is that non-canonical IR was being presented to the pass - the front-end has since been updated to fix this (although possibly not in the way it should).
>>
>> The question is, what should this pass do when valid, but non-canonical IR is passed in. My inclination is to strip out the code to transform to vector in this case and instead just more gracefully handle instead (the tryPromoteAlloca function will just return false - as with other cases it can't handle).
>>
>> Not sure what you mean with your last comment
> 
> 
> Non-canonical IR are constructs that the basic canonicalization passes like instcombine produce. This is supposed to make later optimizations lives easier since they don't have to worry about every possible way of representing something. It is sufficient to just not crash or miscompile on the non-canonical inputs.

Agreed. That's what it now does.

================
Comment at: test/CodeGen/AMDGPU/promote-alloca-array-aggregate.ll:103
+; Function Attrs: nounwind
+define amdgpu_vs void @promote_matrix_aggr() #0 {
+  %f4 = alloca <4 x float>
----------------
arsenm wrote:
> Can this case be reduced?
I removed this one as I don't think it really added much. Particularly with the subsequent changes.

https://reviews.llvm.org/D31710