<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - vectorized f64->i32 loses bits in f32 intermediate"
   href="https://bugs.llvm.org/show_bug.cgi?id=38342">38342</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>vectorized f64->i32 loses bits in f32 intermediate
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>6.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: PowerPC
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>jistone@redhat.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org, tstellar@redhat.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>When using VSX (any ppc64le, or ppc64 targeting pow8), fptosi from f64 to i32
appears to round to an f32 intermediate, which loses some significant bits in
the smaller mantissa.

The following IR is produced by Rust:

; simd_cast::cast
; Function Attrs: uwtable
define internal void @_ZN9simd_cast4cast17h5261798b19538724E(<4 x i32>* noalias
nocapture sret dereferenceable(16), <4 x double>* noalias nocapture
dereferenceable(32) %v) unnamed_addr #0 {
start:
  %1 = load <4 x double>, <4 x double>* %v, align 32
  %2 = fptosi <4 x double> %1 to <4 x i32>
  store <4 x i32> %2, <4 x i32>* %0, align 16
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}


That results in this asm:

.section        .text._ZN9simd_cast4cast17h5261798b19538724E,"ax",@progbits
        .p2align        4
        .type   _ZN9simd_cast4cast17h5261798b19538724E,@function
_ZN9simd_cast4cast17h5261798b19538724E:
.Lfunc_begin14:
        .cfi_startproc
        li 5, 16
        lxvd2x 0, 4, 5
        xxswapd 0, 0
        lxvd2x 1, 0, 4
        xxswapd 1, 1
        xxmrgld 2, 0, 1
        xvcvdpsp 34, 2
        xxmrghd 0, 0, 1
        xvcvdpsp 35, 0
        vmrgew 2, 3, 2
        xvcvspsxws 34, 34
        stvx 2, 0, 3
        blr
        .long   0
        .quad   0
.Lfunc_end14:
        .size   _ZN9simd_cast4cast17h5261798b19538724E,
.Lfunc_end14-.Lfunc_begin14
        .cfi_endproc


The xvcvdpsp rounds to f32, then xvcvspsxws converts to i32.  That rounding
step is explicit in
PPCTargetLowering::combineElementTruncationToVectorTruncation, but this is a
bad optimization since f32 has fewer significant bits in the mantissa.

Using xvcvdpsxws instead for f64->i32 would probably be better.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>