[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen

Tue Jan 24 17:38:18 PST 2017

What is a good way to collect test cases for GISel expectations (in this case handle i1 efficiently)? It would be great to build up a repository of tests as opportunities/potentials pop up. 

Thanks
Gerolf

> On Jan 24, 2017, at 7:34 AM, Hal Finkel via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> 
> On 01/24/2017 05:54 AM, Blank, Guy via llvm-dev wrote:
>> Hi All,
>>  
>> AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1’s.
>> For example,
>> 
>> define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
>>   %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
>>   ret 8 x i32>%r
>> }
>>  
>> Can be lowered to
>>  
>> # BB#0:
>> kxnorw    %k0, %k0, %k1
>> vpgatherqd    (,%zmm1), %ymm0 {%k1}
>> retq
>>  
>>  
>> Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk.
>>  
>> However, making i1 legal affected instruction selection of scalar code as well. Currently, there are cases where operations producing or consuming i1’s are selected (sub-optimally) to instructions that act on K-regs.
>> PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing that i1’s live-in or live-out of basic-blocks are being selected to K register classes, even though we don’t want this to happen. This problem does not happen on subtargets without the AVX-512 feature enabled.
>> The following is the AVX-512 code from the bug report:
>>  
>> # BB#0:                                 # %entry
>> testb        $1, %dil
>> je        .LBB0_1
>> # BB#2:                                 # %if
>> pushq        %rax
>> callq        bar
>>                                         # kill: %AL<def> %AL<kill> %EAX<def>
>> andl        $1, %eax
>> kmovw        %eax, %k0
>> addq        $8, %rsp
>> jmp        .LBB0_3
>> .LBB0_1:
>> kxnorw        %k0, %k0, %k0
>> kshiftrw        $15, %k0, %k0
>> .LBB0_3:                                # %else
>> kmovw        %k0, %eax
>>                                         # kill: %AL<def> %AL<kill> %EAX<kill>
>> Retq
>>  
>> The kmov,kxnor,kshiftr instructions here are the instructions operating on K registers. These are undesirable in the purely scalar input code.
>>  
>>  
>> Having a type that can be possibly legalized to two different register classes exposes a fundamental limitation of the current instruction selection framework, and that is we cannot always make the right decision about live-in/live-out i1’s because we cannot see beyond the boundary of the current basic-block we are visiting. As a side-note, with GlobalISel this can be solved, since we see the entire use-def chain at the function level.
> 
> Exactly. I certainly hope we'll be able to address this sensibly with GlobalISel.
> 
>>  
>> Our initial thought was to write a pass that will be run after ISel to correct bad selections. The pass would examine the use-def chains containing values that were selected to K-regsiter classes, and, when profitable, re-assign the values to GPR register classes (and replace the producing/consuming instructions accordingly). But even with this fix-up pass, we would still be losing many ISel pattern-matching rules that will be missed because the instruction set acting on GPR is richer than the instruction set acting on K-regs. For example, a test trying to match the sbb instruction:
> 
> I think you'd want to do the fixup for these before/during isel, not afterward. PowerPC does some of this (see lib/Target/PowerPC/PPCBoolRetToInt.cpp and DAGCombineTruncBoolExt/DAGCombineExtBoolTrunc in lib/Target/PowerPC/PPCISelLowering.cpp). That code should trivially generalize to other targets.
> 
> There are some places where we do this kind of thing after isel as well (e.g. lib/Target/AArch64/AArch64AdvSIMDScalarPass.cpp).
> 
> That having been said, if you don't have actual i1 registers in which you'd like to keep and manipulate boolean values, marking i1 as illegal makes sense to me.
> 
>  -Hal
> 
>>  
>> define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
>> entry:
>>   %cmp = icmp ugt i32 %x, %y
>>   %dec = sext i1 %cmp to i32
>>   %dec.res = add nsw i32 %dec, %res
>>   ret i32 %dec.res
>> }
>>  
>> Generates the following with AVX2:
>> # BB#0:                                 # %entry
>> cmpl        %edi, %esi
>> sbbl        $0, %edx
>> movl        %edx, %eax
>> retq
>>  
>> While AVX512 produces:
>> # BB#0:                                 # %entry
>> xorl        %ecx, %ecx
>> cmpl        %esi, %edi
>> movl        $-1, %eax
>> cmovbel        %ecx, %eax
>> addl        %edx, %eax
>> retq
>>  
>> So we would still end-up with cases where when the AVX-512 feature is enabled, instruction selection for scalar code becomes inferior.
>>  
>> Finally, we suggest to undo the above issues cause by legalizing i1, by making i1 illegal. This would make instruction selection of scalar code identical for both cases when the AVX-512 feature is on and off. As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can support these operations even when i1 is illegal and the vectors of i1 *are* legal by using the i8 type instead of i1, as it should be implicitly truncated/extended to the element type of the vNi1 vectors. 
>> I am now working on a patch that will implement this approach.
>>  
>> Would appreciate to get feedback and comments.
>>  
>> Thanks,
>> Guy
>>  
>>  
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/1dbbe22c/attachment.html>