[LLVMbugs] [Bug 19370] New: [X86] Non-temporal store from _mm_stream_ps is not mapped to movntps in some cases

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Apr 8 10:13:53 PDT 2014


http://llvm.org/bugs/show_bug.cgi?id=19370

            Bug ID: 19370
           Summary: [X86] Non-temporal store from _mm_stream_ps is not
                    mapped to movntps in some cases
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: dario.domizioli at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 12354
  --> http://llvm.org/bugs/attachment.cgi?id=12354&action=edit
Test case and assembly output

We have encountered a situation where non-temporal stores (coming from a
_mm_stream_ps intrinsic) are not compiled to movntps, and they are instead
producing a standard movaps. In this example this happens when the value to
store is a constant.

I am attaching a tar.gz file containing:
- simplified.cpp
- simplified.ll
- simplified.s
which are, respectively:
- the source code
- the output of "clang -O2 -S -emit-llvm simplified.cpp -o simplified.ll"
- the output of "llc simplified.ll -o simplified.s"

The test case is reduced from a larger file, and I have removed things like
memory fences after the operations (they don't seem to affect the situation).
This is the smallest source I could get to. I am using Linux Ubuntu, the triple
is x86_64-unknown-linux-gnu.

There are two functions in the code. The first one uses a non-temporal store to
copy a value from a global to another. The second one uses a non-temporal store
to write a splat of zeroes (created with _mm_set1_ps) into the destination.

The first is compiled to this IR:
  %0 = load <4 x float>* bitcast ([4 x float]* @src to <4 x float>*), align 32,
!tbaa !1
  store <4 x float> %0, <4 x float>* bitcast ([4 x float]* @dest to <4 x
float>*), align 32, !nontemporal !4
  ret void

And it generates the following asm (stripped):
    vmovaps src(%rip), %xmm0
    vmovntps %xmm0, dest(%rip)
    retq

The second one is compiled to this IR:
  store <4 x float> zeroinitializer, <4 x float>* bitcast ([4 x float]* @dest
to <4 x float>*), align 32, !nontemporal !4
  ret void

And it generates the following asm (stripped):
    vxorps %xmm0, %xmm0, %xmm0
    vmovaps %xmm0, dest(%rip)
    retq

Both stores in the IR have the "!nontemporal" flag, but ISEL seems to select a
different instruction in each case.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140408/b660c97f/attachment.html>


More information about the llvm-bugs mailing list