[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 24 07:34:58 PST 2017
On 01/24/2017 05:54 AM, Blank, Guy via llvm-dev wrote:
>
> Hi All,
>
> AVX-512 introduced the K mask registers and masked operations which
> make a natural choice for legalizing vectors of i1’s.
>
> For example,
>
> define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
>
> %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8
> x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true,
> i1 true>, <8 x i32> undef)
>
> ret 8 x i32>%r
>
> }
>
> Can be lowered to
>
> # BB#0:
>
> kxnorw %k0, %k0, %k1
>
> vpgatherqd (,%zmm1), %ymm0 {%k1}
>
> retq
>
> Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, ..,
> i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so
> making i1 legal seemed like a sensible decision, and this is the
> current state in the top of trunk.
>
> However, making i1 legal affected instruction selection of scalar code
> as well. Currently, there are cases where operations producing or
> consuming i1’s are selected (sub-optimally) to instructions that act
> on K-regs.
>
> PR28650 <https://llvm.org/bugs/show_bug.cgi?id=28650> is an example
> showing that i1’s live-in or live-out of basic-blocks are being
> selected to K register classes, even though we don’t want this to
> happen. This problem does not happen on subtargets without the AVX-512
> feature enabled.
> The following is the AVX-512 code from the bug report:
>
> # BB#0: # %entry
>
> testb $1, %dil
>
> je .LBB0_1
>
> # BB#2: # %if
>
> pushq %rax
>
> callq bar
>
> # kill: %AL<def> %AL<kill> %EAX<def>
>
> andl $1, %eax
>
> kmovw %eax, %k0
>
> addq $8, %rsp
>
> jmp .LBB0_3
>
> .LBB0_1:
>
> kxnorw %k0, %k0, %k0
>
> kshiftrw $15, %k0, %k0
>
> .LBB0_3: # %else
>
> kmovw %k0, %eax
>
> # kill: %AL<def> %AL<kill> %EAX<kill>
>
> Retq
>
> The kmov,kxnor,kshiftr instructions here are the instructions
> operating on K registers. These are undesirable in the purely scalar
> input code.
>
> Having a type that can be possibly legalized to two different register
> classes exposes a fundamental limitation of the current instruction
> selection framework, and that is we cannot always make the right
> decision about live-in/live-out i1’s because we cannot see beyond the
> boundary of the current basic-block we are visiting. As a side-note,
> with GlobalISel this can be solved, since we see the entire use-def
> chain at the function level.
>
Exactly. I certainly hope we'll be able to address this sensibly with
GlobalISel.
> Our initial thought was to write a pass that will be run after ISel to
> correct bad selections. The pass would examine the use-def chains
> containing values that were selected to K-regsiter classes, and, when
> profitable, re-assign the values to GPR register classes (and replace
> the producing/consuming instructions accordingly). But even with this
> fix-up pass, we would still be losing many ISel pattern-matching rules
> that will be missed because the instruction set acting on GPR is
> richer than the instruction set acting on K-regs. For example, a test
> trying to match the sbb instruction:
>
I think you'd want to do the fixup for these before/during isel, not
afterward. PowerPC does some of this (see
lib/Target/PowerPC/PPCBoolRetToInt.cpp and
DAGCombineTruncBoolExt/DAGCombineExtBoolTrunc in
lib/Target/PowerPC/PPCISelLowering.cpp). That code should trivially
generalize to other targets.
There are some places where we do this kind of thing after isel as well
(e.g. lib/Target/AArch64/AArch64AdvSIMDScalarPass.cpp).
That having been said, if you don't have actual i1 registers in which
you'd like to keep and manipulate boolean values, marking i1 as illegal
makes sense to me.
-Hal
> define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone
> ssp {
>
> entry:
>
> %cmp = icmp ugt i32 %x, %y
>
> %dec = sext i1 %cmp to i32
>
> %dec.res = add nsw i32 %dec, %res
>
> ret i32 %dec.res
>
> }
>
> Generates the following with AVX2:
>
> # BB#0: # %entry
>
> cmpl %edi, %esi
>
> sbbl $0, %edx
>
> movl %edx, %eax
>
> retq
>
> While AVX512 produces:
>
> # BB#0: # %entry
>
> xorl %ecx, %ecx
>
> cmpl %esi, %edi
>
> movl $-1, %eax
>
> cmovbel %ecx, %eax
>
> addl %edx, %eax
>
> retq
>
> So we would still end-up with cases where when the AVX-512 feature is
> enabled, instruction selection for scalar code becomes inferior.
>
> Finally, we suggest to undo the above issues cause by legalizing i1,
> by making i1 illegal. This would make instruction selection of scalar
> code identical for both cases when the AVX-512 feature is on and off.
> As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and
> INSERT_VEC_ELEMENT, we believe we can support these operations even
> when i1 is illegal and the vectors of i1 **are** legal by using the i8
> type instead of i1, as it should be implicitly truncated/extended to
> the element type of the vNi1 vectors.
> I am now working on a patch that will implement this approach.
>
> Would appreciate to get feedback and comments.
>
> Thanks,
>
> Guy
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/23cb8f68/attachment.html>
More information about the llvm-dev
mailing list