[patch] Simplify a vpermilps with constant mask
grosbach at apple.com
Mon Apr 21 12:26:37 PDT 2014
On Apr 21, 2014, at 11:26 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
> On 21 April 2014 13:47, Jim Grosbach <grosbach at apple.com> wrote:
>> Why not change the clang IR codegen to just get this right in the first place?
> Because an intermediate optimization can make the mask constant.
So I just did the digging myself. It’s actually a bit more subtle than that. There are two front end intrinsics for these instructions. One for a constant input and one for variable input. *permutevar_ps vs. *permute_ps, specifically. clang does indeed generate the shuffle directly for the latter case, as it should and is what I was asking about. It’s when the user writes code referencing the former that this can come up, which makes sense given inlining, especially in the context of LTO.
This should handle the _pd variant, too, yes? LGTM with that addition.
More information about the llvm-commits