[PATCH] D94098: [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'.

Alexandros Lamprineas via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 19 15:54:51 PDT 2021


labrinea added a comment.

> struct foo { unsigned long long x[8]; };
> void store(int *in, void *addr)
> {
>
>   struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], in[64] };
>   __asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" );
>
> }

For this particular example if we pass the asm operands as i512 the compiler generates the following, which doesn't look bad.

  ldpsw	x2, x3, [x0]
  ldrsw	x4, [x0, #16]
  ldrsw	x5, [x0, #64]
  ldrsw	x6, [x0, #100]
  ldrsw	x7, [x0, #144]
  ldrsw	x8, [x0, #196]
  ldrsw	x9, [x0, #256]
  //APP
  st64b	x2, [x1]
  //NO_APP

Looking at the IR, it seems that SROA gets in the way. It loads all eight i32 values and constructs the i512 operand by performing bitwise operations on them. So I was wrong saying that the load of an i512 value won't get optimized.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94098/new/

https://reviews.llvm.org/D94098



More information about the cfe-commits mailing list