[llvm-bugs] [Bug 38342] New: vectorized f64->i32 loses bits in f32 intermediate

Fri Jul 27 11:40:37 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38342

            Bug ID: 38342
           Summary: vectorized f64->i32 loses bits in f32 intermediate
           Product: libraries
           Version: 6.0
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: PowerPC
          Assignee: unassignedbugs at nondot.org
          Reporter: jistone at redhat.com
                CC: llvm-bugs at lists.llvm.org, tstellar at redhat.com

When using VSX (any ppc64le, or ppc64 targeting pow8), fptosi from f64 to i32
appears to round to an f32 intermediate, which loses some significant bits in
the smaller mantissa.

The following IR is produced by Rust:

; simd_cast::cast
; Function Attrs: uwtable
define internal void @_ZN9simd_cast4cast17h5261798b19538724E(<4 x i32>* noalias
nocapture sret dereferenceable(16), <4 x double>* noalias nocapture
dereferenceable(32) %v) unnamed_addr #0 {
start:
  %1 = load <4 x double>, <4 x double>* %v, align 32
  %2 = fptosi <4 x double> %1 to <4 x i32>
  store <4 x i32> %2, <4 x i32>* %0, align 16
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

That results in this asm:

.section        .text._ZN9simd_cast4cast17h5261798b19538724E,"ax", at progbits
        .p2align        4
        .type   _ZN9simd_cast4cast17h5261798b19538724E, at function
_ZN9simd_cast4cast17h5261798b19538724E:
.Lfunc_begin14:
        .cfi_startproc
        li 5, 16
        lxvd2x 0, 4, 5
        xxswapd 0, 0
        lxvd2x 1, 0, 4
        xxswapd 1, 1
        xxmrgld 2, 0, 1
        xvcvdpsp 34, 2
        xxmrghd 0, 0, 1
        xvcvdpsp 35, 0
        vmrgew 2, 3, 2
        xvcvspsxws 34, 34
        stvx 2, 0, 3
        blr
        .long   0
        .quad   0
.Lfunc_end14:
        .size   _ZN9simd_cast4cast17h5261798b19538724E,
.Lfunc_end14-.Lfunc_begin14
        .cfi_endproc

The xvcvdpsp rounds to f32, then xvcvspsxws converts to i32.  That rounding
step is explicit in
PPCTargetLowering::combineElementTruncationToVectorTruncation, but this is a
bad optimization since f32 has fewer significant bits in the mantissa.

Using xvcvdpsxws instead for f64->i32 would probably be better.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180727/3d636a8d/attachment-0001.html>