[PATCH] [AArch64] Improve codegen of store lane instructions by avoiding GPR usage

Ahmed Bougacha ahmed.bougacha at gmail.com
Mon Nov 10 15:09:39 PST 2014


Hi t.p.northover,

We used to generate stuff like
	umov.b	w8, v0[2]
	strb	w8, [x0, x1]
because the STR*ro* patterns were preferred to ST1*.
Instead, generate:
	add	x8, x0, x1
	st1.b	{ v0 }[2], [x8]

This patch increases the ST1* AddedComplexity to achieve that.

However, that unearthed problems with 0 indices. We used to generate:
	fmov w8, s0
	str w8, [x0, x1, lsl #2]
instead of:
	str s0, [x0, x1, lsl #2]

To correct that:
- only match non-0 vector indices to ST1
- for 0 indices, directly match to STR <subreg>0

Byte-sized instructions don't have the special case for a 0 index, because FPR8s are defined to have untyped content.

I don't really like adding this kind of hack in the otherwise impressively clean backend, so alternatives are welcome.

Thanks!
-Ahmed

http://reviews.llvm.org/D6202

Files:
  lib/Target/AArch64/AArch64InstrFormats.td
  lib/Target/AArch64/AArch64InstrInfo.td
  test/CodeGen/AArch64/arm64-neon-simd-ldst-one.ll
  test/CodeGen/AArch64/arm64-st1.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6202.16011.patch
Type: text/x-patch
Size: 15629 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141110/7f307c5d/attachment.bin>


More information about the llvm-commits mailing list