<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - [x86, AVX] simplify masked memop's mask operand"
   href="https://llvm.org/bugs/show_bug.cgi?id=26697">26697</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[x86, AVX] simplify masked memop's mask operand
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>spatel+llvm@rotateright.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I mentioned this in <a href="http://reviews.llvm.org/D17485">http://reviews.llvm.org/D17485</a> : the vector type of the
mask differs between the LLVM and x86 versions of masked memops. If we
recognize the pattern(s) needed for that conversion in the DAG, I think we can
eliminate the x86-specific masked memop intrinsic defs completely by converting
all of those to *LLVM* masked memop intrinsics:

declare void @llvm.masked.store.v4f32(<4 x float> %val, <4 x float>* %addr,
i32, <4 x i1> %maskb2)

define void @one_mask_bit_set1(<4 x float>* %addr, <4 x float> %val, <4 x i32>
%mask) {
  %mask_signbit = and <4 x i32> %mask, <i32 2147483648, i32 2147483648, i32
2147483648, i32 2147483648>
  %mask_bool = icmp ne <4 x i32> %mask_signbit, zeroinitializer
  call void @llvm.masked.store.v4f32(<4 x float> %val, <4 x float>* %addr, i32
1, <4 x i1> %mask_bool)
  ret void
}

This should be:
    vmaskmovps    %xmm0, %xmm1, (%rdi)

Ie, the sign bit of each mask element is all that the x86 HW uses as the
selector. Currently, we don't know that so we get this mess:

$ ./llc llvm.maskm.ll -o - -mattr=avx
...
LCPI0_0:
    .long    2147483648              ## 0x80000000
    .long    2147483648              ## 0x80000000
    .long    2147483648              ## 0x80000000
    .long    2147483648              ## 0x80000000
    .section    __TEXT,__text,regular,pure_instructions
    .globl    _one_mask_bit_set1
    .p2align    4, 0x90
_one_mask_bit_set1:                     ## @one_mask_bit_set1
    .cfi_startproc
## BB#0:
    vpand    LCPI0_0(%rip), %xmm1, %xmm1
    vpxor    %xmm2, %xmm2, %xmm2
    vpcmpeqd    %xmm2, %xmm1, %xmm1
    vpcmpeqd    %xmm2, %xmm2, %xmm2
    vpxor    %xmm2, %xmm1, %xmm1
    vmaskmovps    %xmm0, %xmm1, (%rdi)
    retq</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>