[PATCH] [IR] Make {extract, insert}element accept an index of any integer type.
Michael Spencer
bigcheesegs at gmail.com
Sat Apr 26 20:09:58 PDT 2014
On Sat, Apr 26, 2014 at 7:55 PM, Chris Lattner <clattner at apple.com> wrote:
>
> On Apr 25, 2014, at 7:17 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
>
>> Given the following C code llvm currently generates suboptimal code for
>> x86-64:
>>
>> __m128 bss4( const __m128 *ptr, size_t i, size_t j )
>> {
>> float f = ptr[i][j];
>> return (__m128) { f, f, f, f };
>> }
>>
>> =================================================
>>
>> define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float>* nocapture readonly %ptr, i64 %i, i64 %j) #0 {
>> %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i
>> %a2 = load <4 x float>* %a1, align 16, !tbaa !1
>> %a3 = trunc i64 %j to i32
>> %a4 = extractelement <4 x float> %a2, i32 %a3
>> %a5 = insertelement <4 x float> undef, float %a4, i32 0
>> %a6 = insertelement <4 x float> %a5, float %a4, i32 1
>> %a7 = insertelement <4 x float> %a6, float %a4, i32 2
>> %a8 = insertelement <4 x float> %a7, float %a4, i32 3
>> ret <4 x float> %a8
>> }
>>
>> =================================================
>>
>> shlq $4, %rsi
>> addq %rdi, %rsi
>> movslq %edx, %rax
>> vbroadcastss (%rsi,%rax,4), %xmm0
>> retq
>>
>> =================================================
>>
>> The movslq is uneeded, but is present because of the trunc to i32 and then
>> sext back to i64 that the backend adds for vbroadcastss.
>>
>> We can't remove it because it changes the meaning.
>
> How does it change meaning? Only the low 2 bits of the index are meaningful, if any of the other ones are set, you get an undefined result.
>
> -Chris
if %j has any bits > 32 set to 1, the trunc removes them. At the
IR/SDag level we don't know why the trunc is there (the intent may be
to mask out the top bits), so we can't remove it.
- Michael Spencer
More information about the llvm-commits
mailing list