<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
span.EmailStyle18
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Hi All,<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1’s.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">For example,<br>
<br>
<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) {<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1
 true, i1 true>, <8 x i32> undef)<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  ret 8 x i32>%r<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">}<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Can be lowered to<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"># BB#0:<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">kxnorw    %k0, %k0, %k1<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">vpgatherqd    (,%zmm1), %ymm0 {%k1}<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">retq<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Legal vectors of i1’s require support for BUILD_VECTOR(i1, i1, .., i1), i1 EXTRACT_VEC_ELEMENT (…) and INSERT_VEC_ELEMENT(i1, …) , so making
 i1 legal seemed like a sensible decision, and this is the current state in the top of trunk.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">However, making i1 legal affected instruction selection of scalar code as well. Currently, there are cases where operations producing or
 consuming i1’s are selected (sub-optimally) to instructions that act on K-regs.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><a href="https://llvm.org/bugs/show_bug.cgi?id=28650">PR28650</a> is an example showing that i1’s live-in or live-out of basic-blocks are
 being selected to K register classes, even though we don’t want this to happen. This problem does not happen on subtargets without the AVX-512 feature enabled.<br>
The following is the AVX-512 code from the bug report:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"># BB#0:                                 # %entry<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">testb        $1, %dil<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">je        .LBB0_1<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"># BB#2:                                 # %if<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">pushq        %rax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">callq        bar<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">                                        # kill: %AL<def> %AL<kill> %EAX<def><o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">andl        $1, %eax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">kmovw        %eax, %k0<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">addq        $8, %rsp<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">jmp        .LBB0_3<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">.LBB0_1:<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">kxnorw        %k0, %k0, %k0<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">kshiftrw        $15, %k0, %k0<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">.LBB0_3:                                # %else<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">kmovw        %k0, %eax<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">                                        # kill: %AL<def> %AL<kill> %EAX<kill><o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Retq<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">The kmov,kxnor,kshiftr instructions here are the instructions operating on K registers. These are undesirable in the purely scalar input
 code.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Having a type that can be possibly legalized to two different register classes exposes a fundamental limitation of the current instruction
 selection framework, and that is we cannot always make the right decision about live-in/live-out i1’s because we cannot see beyond the boundary of the current basic-block we are visiting. As a side-note, with GlobalISel this can be solved, since we see the
 entire use-def chain at the function level.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Our initial thought was to write a pass that will be run after ISel to correct bad selections. The pass would examine the use-def chains
 containing values that were selected to K-regsiter classes, and, when profitable, re-assign the values to GPR register classes (and replace the producing/consuming instructions accordingly). But even with this fix-up pass, we would still be losing many ISel
 pattern-matching rules that will be missed because the instruction set acting on GPR is richer than the instruction set acting on K-regs. For example, a test trying to match the sbb instruction:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">define i32 @test2(i32 %x, i32 %y, i32 %res) nounwind uwtable readnone ssp {<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">entry:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  %cmp = icmp ugt i32 %x, %y<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  %dec = sext i1 %cmp to i32<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  %dec.res = add nsw i32 %dec, %res<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">  ret i32 %dec.res<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">}<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Generates the following with AVX2:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"># BB#0:                                 # %entry<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">cmpl        %edi, %esi<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">sbbl        $0, %edx<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">movl        %edx, %eax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">retq<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">While AVX512 produces:<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"># BB#0:                                 # %entry<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">xorl        %ecx, %ecx<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">cmpl        %esi, %edi<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">movl        $-1, %eax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">cmovbel        %ecx, %eax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">addl        %edx, %eax<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">retq<o:p></o:p></span></p>
<p style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:0in;margin-left:27.0pt;margin-bottom:.0001pt">
<span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">So we would still end-up with cases where when the AVX-512 feature is enabled, instruction selection for scalar code becomes inferior.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Finally, we suggest to undo the above issues cause by legalizing i1, by making i1 illegal. This would make instruction selection of scalar
 code identical for both cases when the AVX-512 feature is on and off. As for supporting BUILD_VECTOR, EXTRACT_VEC_ELEMENT and INSERT_VEC_ELEMENT, we believe we can support these operations even when i1 is illegal and the vectors of i1 *<b>are</b>* legal by
 using the i8 type instead of i1, as it should be implicitly truncated/extended to the element type of the vNi1 vectors.
<br>
I am now working on a patch that will implement this approach.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"> <o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Would appreciate to get feedback and comments.<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Thanks,<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black">Guy<o:p></o:p></span></p>
<p style="margin:0in;margin-bottom:.0001pt"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p>---------------------------------------------------------------------<br>
Intel Israel (74) Limited</p>

<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p></body>
</html>