[llvm-bugs] [Bug 29222] New: Combining MMX with AVX suboptimal
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Aug 29 02:54:21 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=29222
Bug ID: 29222
Summary: Combining MMX with AVX suboptimal
Product: clang
Version: 3.8
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: -New Bugs
Assignee: unassignedclangbugs at nondot.org
Reporter: kobalicek.petr at gmail.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
The following code:
#include <mmintrin.h>
#include <immintrin.h>
int fn(int x) {
__m64 mm = _mm_set1_pi32(x);
mm = _mm_packs_pi16(mm, mm);
__m128i xmm = _mm_movpi64_epi64(mm);
xmm = _mm_packs_epi16(xmm, xmm);
return _mm_cvtsi128_si32(xmm);
}
Compiled with '-O2 -Wall -mavx2 -m32 -fomit-frame-pointer' produces:
fn(int):
sub esp, 20
vbroadcastss xmm0, dword ptr [esp + 24] # Cool idea, but not
vmovlps qword ptr [esp + 8], xmm0 # in our context.
movq mm0, qword ptr [esp + 8] # !!!
packsswb mm0, mm0
movq qword ptr [esp], mm0 # These moves are
vmovq xmm0, qword ptr [esp] # correct.
vpacksswb xmm0, xmm0, xmm0
vmovd eax, xmm0
add esp, 20
ret
I know that MMX is not used anymore, but I wonder why clang prefers a code-path
that is one instruction longer and contains 2 memory accesses more than a more
straightforward 'punpckldq'.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160829/e6fa3f11/attachment.html>
More information about the llvm-bugs
mailing list