[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions

Wed Sep 3 12:50:15 PDT 2014

Hi Larry,

I'll try to answer a few of your questions, but other folks will know more...

On 3 September 2014 19:39, Larry Gritz <lg at larrygritz.com> wrote:
> * Will it really autodetect and use the best SIMD available on my machine? (For example, SSE4.2 vs SSE2, etc.?) Is there anything I need to tell the JIT or the ExecutionEngine to make it use a particular instruction set? (The only case I care about is to generate the best code for the host it's currently running on.)

If you don't specify -mfpu/-mcpu, LLVM will try to guess the best it
can. Some archs (x86) are better than others (ARM) at that, but it
should never generate bad code (ie. AVX on an SSE machine). At most,
it'll guess conservatively and maybe generave SSE code in AVX
machines, but not the other way around.

> * Is there any difference in vector functionality of old JIT versus MCJIT?  (Yes, I know that starting in 3.6, it'll be only MCJIT.)

I don't think so. Both use the same passes and back-ends, so I'd be
surprised if they did.

As obvious as it sounds, I'd heavily encourage you not to use the old
JIT. Not only we deleted it for good, but it was never that good on
all architectures, so you'll be stuck with an ageing, unsupported and
possibly broken JIT technology.

> * What happens if it runs on a machine without SSE? Is using vectors an error, or will it just generate the equivalent scalar code automatically? If it generates scalar code, what is the function signature, as it would appear to be called from a C function, on a machine without __m128?
> * What happens to vector types of length not equal to the machine's SIMD length? If I defined a <3 x float>, would it always generate scalar code, or would it pad to a 4xfloat and generate SSE instructions? Or is it not even allowed?

The answer to both questions is: it depends.

Obviously, <3 x float> is not a legal type on any machine, so LLVM
tends to either expand it to a larger vector or split into multiple
vectors, etc. There are some IR passes that do all that, including
serialization of vector code, but your mileage may vary on different
back-ends to support everything. Since you're fiddling with IR and
JIT, you should make your choices based on what each one supports.

Back-ends have a late legalization phase, where they scan the DAG
(after IR lowering) and legalize types (ex. i64 into i32+i32 in 32-bit
archs), so depending on the IR you provide the back-end, it may know
how to legalize some types, but not others. Be careful. And, as usual,
if you find any odd behaviour, please report to the list or in
bugzilla.

cheers,
--renato