[PATCH] [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets

Tue Dec 2 20:14:38 PST 2014

I only have one concern here, and it is just a very general concern:

On Sun, Nov 30, 2014 at 1:37 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:

> 4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl
> which was using (v)blendps - causing domain switch stalls. This patch fixes
> this by using (v)pblendw instead.
>
> The updated tests on test/CodeGen/X86/sse41.ll still contain a domain
> stall due to the use of insertps - I'm looking at fixing this in a future
> patch.
>

Until this is fixed, the test cases have actually regressed because they're
still using insertps. =/

>
> Pre-SSE4.1 targets are still affected by a similar domain stall using
> movss - we could fix this by using 2 x ( punpckldq XMM, zero ) in series -
> if people agree I'll make a patch for this as well.
>

Yes, I think its important to fix all of these together so we don't see
stray regressions when we improve the domain crossing situation, but cause
the domain crosses to be less easily hidden by the processors inherent
out-of-order execution, hyper threads, etc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141202/77bb3cd3/attachment.html>