[PATCH] D47239: [InstCombine] Combine XOR and AES insructions on ARM/ARM64
Michael Brase via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 23 14:11:22 PDT 2018
mbrase added a comment.
Hi, I can try to give a little more background on the motivation behind this patch, and why the key might be zero.
I’m trying to cross compile some code which uses x86 AES intrinsics on Aarch64 with LLVM. However, the x86 AES instructions have slightly different semantics than the ARM instructions. One important difference is that x86 performs an XOR at the end of their AESENC/AESENCLAST instructions, whereas ARM does the XOR at the beginning of their AESE instruction.
To emulate the x86 AES intrinsics on Aarch64, I defined some inline functions as a drop in replacement, to avoid rewriting the algorithm by hand. Here is what they look like:
#include <stdint.h>
#include <arm_neon.h>
typedef uint8x16_t __m128i; // __m128i is an x86 type, map it to ARM neon type
// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_aesenc_si128&expand=221
static inline __m128i _mm_aesenc_si128 (__m128i a, __m128i RoundKey)
{
return vaesmcq_u8(vaeseq_u8(a, (__m128i){})) ^ RoundKey;
}
// https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_aesenclast_si128&expand=222
static inline __m128i _mm_aesenclast_si128 (__m128i a, __m128i RoundKey)
{
return vaeseq_u8(a, (__m128i){}) ^ RoundKey;
}
To skip the XOR from the ARM AESE instruction, I pass in a key value of zero, and then manually XOR the real key at the end of the function. The only downside to this approach is that when the inline functions are substituted in the algorithm, they have an unnecessary EOR instruction when multiple rounds of AES encryption are done back to back. Consider the following test code:
// Performs 3 rounds of AES encryption
// Note: In proper 128-bit AES encryption, 10 rounds are used
__m128i aes_block(__m128i data, __m128i k0, __m128i k1, __m128i k2, __m128i k3)
{
data = data ^ k0;
data = _mm_aesenc_si128(data, k1);
data = _mm_aesenc_si128(data, k2);
data = _mm_aesenclast_si128(data, k3);
return data;
}
Compiles down to this:
0000000000000024 <aes_block>:
24: 6e201c20 eor v0.16b, v1.16b, v0.16b ; EOR can be combined with AESE
28: 6f00e401 movi v1.2d, #0x0
2c: 4e284820 aese v0.16b, v1.16b
30: 4e286800 aesmc v0.16b, v0.16b
34: 6e221c00 eor v0.16b, v0.16b, v2.16b ; EOR can be combined with AESE
38: 4e284820 aese v0.16b, v1.16b
3c: 4e286800 aesmc v0.16b, v0.16b
40: 6e231c00 eor v0.16b, v0.16b, v3.16b ; EOR can be combined with AESE
44: 4e284820 aese v0.16b, v1.16b
48: 6e241c00 eor v0.16b, v0.16b, v4.16b
4c: d65f03c0 ret
My goal with this intrinsic is to teach LLVM how to eliminate some of these extra EOR instructions. With my patch, LLVM produces this:
0000000000000024 <aes_block>:
24: 4e284820 aese v0.16b, v1.16b
28: 4e286800 aesmc v0.16b, v0.16b
2c: 4e284840 aese v0.16b, v2.16b
30: 4e286800 aesmc v0.16b, v0.16b
34: 4e284860 aese v0.16b, v3.16b
38: 6e241c00 eor v0.16b, v0.16b, v4.16b
3c: d65f03c0 ret
Repository:
rL LLVM
https://reviews.llvm.org/D47239
More information about the llvm-commits
mailing list