[PATCH] [IR] Make {extract, insert}element accept an index of any integer type.
Michael Spencer
bigcheesegs at gmail.com
Fri Apr 25 19:21:30 PDT 2014
CCing Chris, who apparently doesn't have a phab account.
- Michael Spencer
On Fri, Apr 25, 2014 at 7:17 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> Given the following C code llvm currently generates suboptimal code for
> x86-64:
>
> __m128 bss4( const __m128 *ptr, size_t i, size_t j )
> {
> float f = ptr[i][j];
> return (__m128) { f, f, f, f };
> }
>
> =================================================
>
> define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float>* nocapture readonly %ptr, i64 %i, i64 %j) #0 {
> %a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i
> %a2 = load <4 x float>* %a1, align 16, !tbaa !1
> %a3 = trunc i64 %j to i32
> %a4 = extractelement <4 x float> %a2, i32 %a3
> %a5 = insertelement <4 x float> undef, float %a4, i32 0
> %a6 = insertelement <4 x float> %a5, float %a4, i32 1
> %a7 = insertelement <4 x float> %a6, float %a4, i32 2
> %a8 = insertelement <4 x float> %a7, float %a4, i32 3
> ret <4 x float> %a8
> }
>
> =================================================
>
> shlq $4, %rsi
> addq %rdi, %rsi
> movslq %edx, %rax
> vbroadcastss (%rsi,%rax,4), %xmm0
> retq
>
> =================================================
>
> The movslq is uneeded, but is present because of the trunc to i32 and then
> sext back to i64 that the backend adds for vbroadcastss.
>
> We can't remove it because it changes the meaning. The IR that clang
> generates is already suboptimal. What clang really should emit is:
>
> %a4 = extractelement <4 x float> %a2, i64 %j
>
> This patch makes that legal. A separate patch will teach clang to do it.
>
> http://reviews.llvm.org/D3519
>
> Files:
> docs/LangRef.rst
> lib/Bitcode/Reader/BitcodeReader.cpp
> lib/Bitcode/Writer/BitcodeWriter.cpp
> lib/IR/Constants.cpp
> lib/IR/Instructions.cpp
> test/CodeGen/X86/vec_splat.ll
> test/Feature/instructions.ll
More information about the llvm-commits
mailing list