[LLVMdev] ARM NEON intrinsics in clang

Tim Northover t.p.northover at gmail.com
Thu Sep 26 08:07:31 PDT 2013


Hi Stan,

> I spent the last three days trying to compile a version of LLVM that would
> allow me to compile sources that contain these intrinsics, but with no success.

Ok. This we can probably help with. Did you manage to build a version
of Clang (preferably from git/subversion)?

If so, you're probably having problems cross-compiling. Renato's
recently worked on some documentation in this area:
http://clang.llvm.org/docs/CrossCompilation.html.

But for a quick hack, you could try:

$ cat > neon.c
#include <arm_neon.h>

float32x4_t my_func(float32x4_t lhs, float32x4_t rhs) {
  return vaddq_f32(lhs, rhs);
}
$ clang --target=arm-linux-gnueabihf -mcpu=cortex-a15 -ffreestanding
-O3 -S -o - neon.c

("ffreestanding" will dodge any issues with your supporting toolchain,
but won't work for larger tests. You've got to actually solve the
issues before you start running code).

> In the process I found out that clang doesn't support NEON (as per
> http://blog.llvm.org/2010/04/arm-advanced-simd-neon-intrinsics-and.html),

That's rather out of date, I'm afraid. 32-bit ARM does support both
NEON intrinsics and a reasonable amount of LLVM's own
auto-vectorisation (which is in its early stages, but we have some
kind of loop and SLP vectorisation going on).

> but there has been at least some effort in adding it
> (https://www.codeaurora.org/patches/quic/llvm/32040/clang-Initial-Neon-support.patch).

That patch is part of the effort to implement NEON (instructions and
intrinsics) on the 64-bit ARM architecture (AArch64).

> I also tried compiling LLVM 2.9 + llvm-gcc but that failed too many times
> and I gave up.

Yep. llvm-gcc is long dead, and LLVM 2.9 isn't much healthier.

> current plan is to implement the ARM NEON intrinsics as a shared library,
> using attributes as in:

That would probably be possible, but very bad from a performance
perspective. The whole point of NEON intrinsics is to speed up vector
code; if you've got the overhead of a call/return for each intrinsic
and completely fixed registers around even that you'll be in for a
world of pain.

> Ideally, I want to be able to compile C code that includes ARM NEON
> intrinsics to other targets (TI processors, e.g.).

Now that's going to be harder. LLVM itself doesn't support any TI
processors, for a start. And many of the NEON intrinsics (those with
more complex semantics) compile to LLVM IR with LLVM-level intrinsics,
which are only supported in the ARM backend.

Your shared library idea would work semantically, of course. But I'm
not sure what useful information could be extracted from it.

Cheers.

Tim.



More information about the llvm-dev mailing list