[llvm-dev] InstCombine question on combineLoadToOperationType

Thu Nov 17 14:10:23 PST 2016

> On Nov 16, 2016, at 11:23 AM, Friedman, Eli via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On 11/15/2016 4:22 PM, Pete Couperus via llvm-dev wrote:
>> Hello,
>>  
>> Context: We have a backend where v32i1 is a Legal type, but the storage for v32i1 is not 32-bits/uses a different instruction sequence.
>> We ran into an issue because combineLoadToOperationType changed v32i1 loads into i32 loads, so a sequence like:
>> define void @bits(<32 x i1>* %A, <32 x i1>* %B) {
>>   %a = load <32 x i1>, <32 x i1>* %A
>>   store <32 x i1> %a, <32 x i1>* %B
>>   ret void
>> }
>>  
>> Is transformed to:
>> define void @bits(<32 x i1>* %A, <32 x i1>* %B) {
>>   %1 = bitcast <32 x i1>* %A to i32*
>>   %a1 = load i32, i32* %1, align 4
>>   %2 = bitcast <32 x i1>* %B to i32*
>>   store i32 %a1, i32* %2, align 4
>>   ret void
>> }
>>  
>> This looks to be intentional. 
>> Is there a way to specify in the data-layout that v32i1 storage is not 32-bits?
> 
> No, not at the moment.  You could propose something, but you'd probably have a hard time convincing anyone it's necessary; nobody has cared about this for a very long time.
> 
>> Absent that, is there any other reliable way to retain the original vector loads/store without just disabling this part of InstCombine?
> 
> No, and you'll run into other problems (e.g. alias analysis) if the data layout lies about the size of a load or store.
> 
>> Or is it the backend’s responsibility to try and work with this?
> 
> Where are these loads coming from?  x86 without AVX512 doesn't have any convenient way generate code for a <32 x i1> store, but it doesn't matter because frontends don't generate <N x i1> loads and stores.
> 
> If you have a frontend which is generating loads and stores like this, you could probably change it to use some other sequence (like a platform-specific intrinsic, or some sequence involving sext/trunc).

Why not just generating the code with the proper storage? If <32 x i1> are used where the storage is <32 x i8> (for example), it seems a bad idea to lie to the IR and hide it with platform-specific intrinsic, right? I fear this would cause other problem down the line in the optimizer. 

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/9eef4de4/attachment.html>