[llvm-dev] Sub-optimal register allocation
Mohamed Aly via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 7 17:06:56 PDT 2018
I am using Halide, and trying to generate a simplified version of the
inner kernel in a GEMM operation, similar to this
Basically it multiplies a 12x1 column vector with a 1x4 row vector and
updates an accumulator cell of size 12x4. I am targeting 32-bit ARM NEON.
Ideally, all the accumulators and operands should fit in the q registers,
without spilling to the stack. However, the generated ARM assembly uses the
registers in a sub-optimal way, and keeps spilling registers onto the stack
and reloading them.
The relevant part of the LLVM IR is here
and the corresponding arm32 assembly is here
Any help to how to solve this, or what might be causing it, will be
Thanks a lot,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev