[patch] Simplify a vpermilps with constant mask

Mon Apr 21 15:18:39 PDT 2014

On 21 April 2014 15:26, Jim Grosbach <grosbach at apple.com> wrote:
>
> On Apr 21, 2014, at 11:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>
>> On 21 April 2014 13:47, Jim Grosbach <grosbach at apple.com> wrote:
>>> Why not change the clang IR codegen to just get this right in the first place?
>>
>> Because an intermediate optimization can make the mask constant.
>
>
> So I just did the digging myself. It’s actually a bit more subtle than that. There are two front end intrinsics for these instructions. One for a constant input and one for variable input. *permutevar_ps vs. *permute_ps, specifically. clang does indeed generate the shuffle directly for the latter case, as it should and is what I was asking about. It’s when the user writes code referencing the former that this can come up, which makes sense given inlining, especially in the context of LTO.

Sorry about that, I misunderstood the question.

> This should handle the _pd variant, too, yes? LGTM with that addition.

Makes sense. I also added the _256 ones and committed as r206801.

Cheers,
Rafael