<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [AVX512DQ] v2i1/v4i1 stores should zero top bits"

   href="https://llvm.org/bugs/show_bug.cgi?id=30888">30888</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[AVX512DQ] v2i1/v4i1 stores should zero top bits

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>cameron.mcinally@nyu.edu

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>elena.demikhovsky@intel.com, llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=17543" name="attach_17543" title="IR test case">attachment 17543</a> <a href="attachment.cgi?id=17543&action=edit" title="IR test case">[details]</a></span>

IR test case

Filing a Bug at Elena's request...

We're running into a problem with undefined top bits on masks less than 8b in

8b mask registers. The smallest mask register with AVX512DQ is 8 bits. And the

smallest mask load/store instruction moves 8b. Masks less than 8b (i.e. v2i1

and v4i1) are causing incorrect answers since the top bits are currently

undefined. 

>From what I’ve learnt on llvm-dev, LLVM expects the top bits of v2i1/v4i1 regs

to be zeroed upon stores. I’ve locally added a custom ISelLowering to achieve

this and that is functional, but produces a lot of KSHIFTs to clear the top

bits of the mask registers when the mask is < 8b.

  // Handle v2i1/v4i1 stores. LangRef assumes that

  // the undefined bits are zeroed. 

  EVT MemVT = St->getMemoryVT();

  SDValue Op = St->getValue();

  MVT OpVT = Op.getValueType().getSimpleVT();

  unsigned NumElts = OpVT.getVectorNumElements();

  if (MemVT.isVector() &&

      MemVT.getVectorElementType() == MVT::i1 &&

      NumElts <= 4) {

    Op = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, MVT::v8i1,

                     getZeroVector(MVT::v8i1, Subtarget, DAG, dl),

                     Op, DAG.getIntPtrConstant(0, dl));

    return DAG.getStore(St->getChain(), dl, Op, St->getBasePtr(),

                        St->getMemOperand());

And this produces code like:

        knotw   %k0, %k0                

        kshiftlb        $4, %k0, %k0    <-- These KSHIFTs are

        kshiftrb        $4, %k0, %k0    <-- zeroing the top bits.

        kmovb   %k0, 6(%rsp)            

        movzbl  6(%rsp), %ecx           

        kmovw   %ecx, %k0               

        kortestw        %k0, %k0        

The new KSHIFT instructions produce functional code, but this seems like a big

hammer. Can we do better?</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>