[PATCH] Added more insertps optimizations

Filipe Cabecinhas filcab+llvm.phabricator at gmail.com
Fri May 16 09:05:02 PDT 2014


Hi Andrea,

Wouldn't the tablegen patterns be problematic when we use the load result
several times?
If we use it several times, then we shouldn't generate additional loads. Or
do you think it doesn't matter since they should be close together, since
we're generating for one BB only?

That was the reasoning behind doing it with code and guarding them with
MayFoldLoad, which includes hasOneUse().

I'll add the AVX tests today.

  Filipe

On Friday, May 16, 2014, Andrea Di Biagio <Andrea_DiBiagio at sn.scee.net>
wrote:

> Hi Filipe,
>
> I think there is a better way to fix this bug.
>
> For example, you can simply add the following ISel patterns to
> X86InstrSSE.td instead of adding new target combine rules in
> X86ISelLowering.cpp.
>
>
>   let Predicates = [UseSSE41] in {
>     def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (loadv4f32
> addr:$src2),
>                   imm:$src3)),
>               (INSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
>     def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (X86PShufd (v4f32
>                    (scalar_to_vector (loadf32 addr:$src2))), (i8 0)),
> imm:$src3)),
>               (INSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
>   }
>
>   let Predicates = [UseAVX] in {
>     def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1), (loadv4f32
> addr:$src2),
>                   imm:$src3)),
>               (VINSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
>     def : Pat<(v4f32 (X86insertps (v4f32 VR128:$src1),
>                   (X86VBroadcast (loadf32 addr:$src2)), imm:$src3)),
>               (VINSERTPSrm VR128:$src1, addr:$src2, imm:$src3)>;
>   }
>
> I am pretty sure that the four patterns above would cover all the
> interesting cases.
> I personally prefer to have tablegen patterns instead of complicated
> target combine rules that would only trigger on the already legalized and
> immediately before instruction selection.
>
> Also, you should add tests to verify the AVX codegen as well. Currently
> your new tests only verifies that we do the correct thing with SSE4.1.
> However, your patch also affects AVX (in fact, you specifically added
> combine rules for the case where the inserted element comes from a
> vbroadcast).
>
> -Andrea
>
> http://reviews.llvm.org/D3581
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140516/09730cb7/attachment.html>


More information about the llvm-commits mailing list