[llvm-dev] X86 Intrinsics : _mm_storel_epi64/ _mm_loadl_epi64 with -m32
Bharathi Seshadri via llvm-dev
llvm-dev at lists.llvm.org
Thu May 24 11:06:03 PDT 2018
Hi,
I’m using _mm_storel_epi64/ _mm_loadl_epi64 in my test case as below
and generating 32-bit code (using -m32 and -msse4.2). The 64-bit load
and 64-bit store operations are replaced with two 32-bit mov
instructions, presumably due to the use of uint64_t type. If I use
__m128i instead of uint64_t everywhere, then the read and write happen
as 64-bit operations using the xmm registers as expected.
void indvbl_write64(volatile void *p, uint64_t v)
{
__m128i tmp = _mm_loadl_epi64((__m128i const *)&v);
_mm_storel_epi64((__m128i *)p, tmp);
}
uint64_t indivbl_read64 (volatile void *p)
{
__m128i tmp = _mm_loadl_epi64((__m128i const *)p);
return *(uint64_t *)&tmp;
}
Options used to compile: clang –O2 –c –msse4.2 –m32 test.c
Generated code:
00000000 <indvbl_write64>:
0: 8b 44 24 08 mov 0x8(%esp),%eax
4: 8b 54 24 04 mov 0x4(%esp),%edx
8: 8b 4c 24 0c mov 0xc(%esp),%ecx
c: 89 4a 04 mov %ecx,0x4(%edx)
f: 89 02 mov %eax,(%edx)
11: c3 ret
12: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%eax,%eax,1)
19: 00 00 00
1c: 0f 1f 40 00 nopl 0x0(%eax)
00000020 <indvbl_read64>:
20: 8b 4c 24 04 mov 0x4(%esp),%ecx
24: 8b 01 mov (%ecx),%eax
26: 8b 51 04 mov 0x4(%ecx),%edx
29: c3 ret
The front-end generates insertelement <2 x i64> and extractelement <2
x i64> for the load and stores as expected and optimizer generates
load i64 and store i64, which are then lowered into 32-bit move
instructions in the Instruction Selection Phase.
Would it be possible and safe to generate a single 64-bit load/store
in this case with –m32 ? If so, please may I have some pointers to
related parts of the code I should be looking at to make this
improvement.
Thanks,
Bharathi
More information about the llvm-dev
mailing list