[llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen

Blank, Guy via llvm-dev llvm-dev at lists.llvm.org
Sun Feb 5 08:51:46 PST 2017


Actually the K registers are currently represented by vectors of i1, except for the VK1 register class which is used with scalar i1 – that is the only one I intend to change (to be a vector of i1 as well).

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Thursday, February 02, 2017 20:50
To: Blank, Guy <guy.blank at intel.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [X86][AVX512] RFC: make i1 illegal in the Codegen


On Jan 24, 2017, at 3:54 AM, Blank, Guy via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi All,

AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1’s.
For example,


define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {
  %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
  ret 8 x i32>%r
}

Can be lowered to

# BB#0:
kxnorw    %k0, %k0, %k1
vpgatherqd    (,%zmm1), %ymm0 {%k1}
retq


Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making i1 legal seemed like a sensible decision, and this is the current state in the top of trunk.

However, making i1 legal affected instruction selection of scalar code as well. Currently, there are cases where operations producing or consuming i1’s are selected (sub-optimally) to instructions that act on K-regs.
PR28650<https://llvm.org/bugs/show_bug.cgi?id=28650> is an example showing that i1’s live-in or live-out of basic-blocks are being selected to K register classes, even though we don’t want this to happen. This problem does not happen on subtargets without the AVX-512 feature enabled.
The following is the AVX-512 code from the bug report:

# BB#0:                                 # %entry
testb        $1, %dil
je        .LBB0_1
# BB#2:                                 # %if
pushq        %rax
callq        bar
                                        # kill: %AL<def> %AL<kill> %EAX<def>
andl        $1, %eax
kmovw        %eax, %k0
addq        $8, %rsp
jmp        .LBB0_3
.LBB0_1:
kxnorw        %k0, %k0, %k0
kshiftrw        $15, %k0, %k0
.LBB0_3:                                # %else
kmovw        %k0, %eax
                                        # kill: %AL<def> %AL<kill> %EAX<kill>
Retq

The kmov,kxnor,kshiftr instructions here are the instructions operating on K registers. These are undesirable in the purely scalar input code.


Having a type that can be possibly legalized to two different register classes exposes a fundamental limitation of the current instruction selection framework, and that is we cannot always make the right decision about live-in/live-out i1’s because we cannot see beyond the boundary of the current basic-block we are visiting. As a side-note, with GlobalISel this can be solved, since we see the entire use-def chain at the function level.

Our initial thought was to write a pass that will be run after ISel to correct bad selections. The pass would examine the use-def chains containing values that were selected to K-regsiter classes, and, when profitable, re-assign the values to GPR register classes (and replace the producing/consuming instructions accordingly). But even with this fix-up pass, we would still be losing many ISel pattern-matching rules that will be missed because the instruction set acting on GPR is richer than the instruction set acting on K-regs. For example, a test trying to match the sbb instruction:

define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {
entry:
  %cmp = icmp ugt i32 %x, %y
  %dec = sext i1 %cmp to i32
  %dec.res = add nsw i32 %dec, %res
  ret i32 %dec.res
}

Generates the following with AVX2:
# BB#0:                                 # %entry
cmpl        %edi, %esi
sbbl        $0, %edx
movl        %edx, %eax
retq

While AVX512 produces:
# BB#0:                                 # %entry
xorl        %ecx, %ecx
cmpl        %esi, %edi
movl        $-1, %eax
cmovbel        %ecx, %eax
addl        %edx, %eax
retq

So we would still end-up with cases where when the AVX-512 feature is enabled, instruction selection for scalar code becomes inferior.

Finally, we suggest to undo the above issues cause by legalizing i1, by making i1 illegal. This would make instruction selection of scalar code identical for both cases when the AVX-512 feature is on and off. As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can support these operations even when i1 is illegal and the vectors of i1 *are* legal by using the i8 type instead of i1, as it should be implicitly truncated/extended to the element type of the vNi1 vectors.


FWIW this makes sense to me: using vector of i8 to represent the boolean values and making sure to select the right pattern to use the K register seems reasonable.
How are you planning to implement the selection?

Thanks,

—
Mehdi



I am now working on a patch that will implement this approach.

Would appreciate to get feedback and comments.

Thanks,
Guy


---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170205/a868204b/attachment.html>


More information about the llvm-dev mailing list