[LLVMdev] Target intrinsics and translation

Tue Nov 15 08:56:57 PST 2011

On Mon, 2011-11-14 at 15:41 -0800, Chris Lattner wrote:
> On Nov 14, 2011, at 3:01 PM, Dan Gohman wrote:
> > LLVM (via clang) currently translates target intrinsics to generic IR
> > whenever it can. For example, on x86 it translates _mm_loadu_pd to a
> > simple load instruction with an alignment of 1. The backend is then
> > responsible for translating the load back to the corresponding
> > machine instruction.
> > 
> > The advantage of this is that it opens up such code to LLVM's
> > optimizers, which can theoretically speed it up.
> > 
> > The disadvantage is that it's pretty surprising when intrinsics
> > designed for the sole purpose of giving programmers access to specific
> > machine instructions is translated to something other than those
> > instructions. LLVM's optimizers aren't perfect, and there are many
> > aspects of performance which they don't understand, so they can also
> > pessimize code.
> > 
> > If the user has gone through the trouble of using target-specific
> > intrinsics to ask for a specific sequence of machine instructions,
> > is it really appropriate for the compiler to emit different
> > instructions, using its own heuristics?

In my personal opinion, this should be controlled via a compiler option.
The default should be to omit the instructions as specified. This should
at least be true at low optimization levels.

> 
> There are several benefits to doing it this way:
> 
> 1. Fewer intrinsics in the compiler, fewer patterns in the targets, less redundancy.

I don't view limiting the number of intrinsics in LLVM as a worthwhile
goal unto itself. The fact that specifying intrinsics is currently a
fairly-verbose procedure (requiring updates in several different files)
is something that we should fix via a more-intelligent tablegen setup.

> 
> 2. The compiler should know better than the user, because code is often written and forgotten about.  The compiler can add value when building hand tuned and highly optimized SSE2 code for an SSE4 chip, for example.

This is a good use case for '-O4' -- This means that I've asked for
something specific and the compiler may do something else instead. I
think that '-O3' (and below) should make what I've specified as fast as
possible. Since specifying '-O3' is a fairly standard default choice, I
think it should provide the safer behavior.

> 
> 3. If the compiler is pessimizing (e.g.) unaligned loads, then it is a serious bug that should be fixed, not something that should be worked around by adding intrinsics.  Adding intrinsics just makes it much less likely that we'd find out about it and then be able to fix it.
> 
> 4. In practice, if we had intrinsics for everything, I strongly suspect that a lot of generic patterns wouldn't get written.  This would pessimize "portable" code using standard IR constructs.
> 

We should work on providing a comprehensive set of generic vector
builtins (similar to __builtin_shuffle) to cover other cases that can be
represented in the IR directly (either as-is, or suitably extended).
This has the benefit of working over many different architectures. And
we could make sure that patterns will be written to support these
generic builtins.

 -Hal

> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory