[LLVMdev] Generate scalar SSE instructions instead of packed instructions

Thu Feb 21 15:38:40 PST 2013

On Thu, Feb 21, 2013 at 12:14 PM, Nadav Rotem <nrotem at apple.com> wrote:

> You can change the input LLVM-IR.
>
> On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com>
> wrote:
>
>  Hi,****
>
> ** **
>
> I am interested in evaluating the performance of packed vs scalar
> double-precision floating point instructions on x86-atom and I was
> wondering if anyone knows more precisely where to modify llvm to use one or
> the other. I know I probably need to change something in the type
> legalizer. Could anyone provide more details than that?****
>
> **
>
> Hey Tyler,

Nadav is correct. Un-vectorizing would best be done before the IR level.

If one split the vectors at the ISel level, one would incur unnecessary
extracts, which would skew the timing data.

To digress a bit, I've found that it's necessary to rewrite the scalar SSE
patterns to accept true scalar operands; not fake vector operands like the
GNU built-ins. This topic was discussed a while back and the popular belief
is that partial register updates would cause a performance hit when
operating on true scalars. However, my empirical evidence suggests that the
extra memory traffic of stuffing vectors is more of a performance hit than
the partial register updates. Unfortunately, this is counter-intuitive to
the documentation available. And, this may only be true for the benchmarks
that hold my interest.

For completeness, I'm mainly interested in Interlagos and Sandybridge, so
this conjecture may not hold for other processors such as Atom.

Hope this helps,
Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130221/acf6a6c9/attachment.html>