[LLVMdev] Target intrinsics and translation

Tue Nov 15 00:52:51 PST 2011

Hi Dan,

On 15/11/11 00:01, Dan Gohman wrote:
> LLVM (via clang) currently translates target intrinsics to generic IR
> whenever it can. For example, on x86 it translates _mm_loadu_pd to a
> simple load instruction with an alignment of 1. The backend is then
> responsible for translating the load back to the corresponding
> machine instruction.
>
> The advantage of this is that it opens up such code to LLVM's
> optimizers, which can theoretically speed it up.
>
> The disadvantage is that it's pretty surprising when intrinsics
> designed for the sole purpose of giving programmers access to specific
> machine instructions is translated to something other than those
> instructions.

gcc only supports a limited set of vector expressions.  If you want to
shuffle a vector, how do you do that?  The only way (AFAIK) is to use
a target intrinsic.  Thus people can end up using target intrinsics
because it's the only way they have to express vector operations, not
because they absolutely want to have that particular instruction.

  LLVM's optimizers aren't perfect, and there are many
> aspects of performance which they don't understand, so they can also
> pessimize code.

Such cases should be improved.  They would never be noticed if everyone
was using target intrinsics rather than generic IR.

> If the user has gone through the trouble of using target-specific
> intrinsics to ask for a specific sequence of machine instructions,
> is it really appropriate for the compiler to emit different
> instructions, using its own heuristics?

This same question might come up in the future with inline asm.
Thanks to the MC project I guess it may become feasible to parse
peoples inline asm and do optimizations on it.  Personally I'm
in favour of that, but indeed there are dangers.

Ciao, Duncan.