[LLVMdev] Target intrinsics and translation

Mon Nov 14 15:01:42 PST 2011

LLVM (via clang) currently translates target intrinsics to generic IR
whenever it can. For example, on x86 it translates _mm_loadu_pd to a
simple load instruction with an alignment of 1. The backend is then
responsible for translating the load back to the corresponding
machine instruction.

The advantage of this is that it opens up such code to LLVM's
optimizers, which can theoretically speed it up.

The disadvantage is that it's pretty surprising when intrinsics
designed for the sole purpose of giving programmers access to specific
machine instructions is translated to something other than those
instructions. LLVM's optimizers aren't perfect, and there are many
aspects of performance which they don't understand, so they can also
pessimize code.

If the user has gone through the trouble of using target-specific
intrinsics to ask for a specific sequence of machine instructions,
is it really appropriate for the compiler to emit different
instructions, using its own heuristics?

Dan