[PATCH] Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)
Sanjay Patel
spatel at rotateright.com
Tue Sep 16 08:50:26 PDT 2014
>>! In D5347#10, @delena wrote:
> I just suggest to add this pattern to X86InstrSSE.td:
>
> def : Pat<(v2i64 (X86VBroadcast (loadi64 addr:$src))),
> (v2i64 (EXTRACT_SUBREG (v4i64 (VBROADCASTSDYrm addr:$src)),sub_xmm)))>;
I tried this, but it's not producing the codegen that I want. Specifically, we want to use movddup when possible, and we don't want to alter codegen at all when not optimizing for size. (Apologies for pattern ignorance - I haven't used these yet.)
1. In the testcase for v2f64, no splat is generated (movddup expected).
2. In the testcase for v2i64 with AVX, we get:
vbroadcastsd LCPI4_0(%rip), %ymm1
vpaddq %xmm1, %xmm0, %xmm0
vzeroupper <--- can the pattern be rewritten to avoid this? even if yes, movddup is smaller than broadcastsd
This is worse in size than what my patch produces:
vmovddup LCPI4_0(%rip), %xmm1
vpaddq %xmm1, %xmm0, %xmm0
3. In the testcase for v4i64 with AVX, we again would generate vbroadcastsd
vbroadcastsd LCPI5_0(%rip), %ymm1
vextractf128 $1, %ymm0, %xmm2
vpaddq %xmm1, %xmm2, %xmm2
vpaddq %xmm1, %xmm0, %xmm0
vinsertf128 $1, %xmm2, %ymm0, %ymm0
But movddup is better because it is one byte smaller than vbroadcastsd.
4. Using the pattern also caused a failure in test/CodeGen/X86/exedepsfix-broadcast.ll because a broadcast is generated even when not optimizing for size. I don't think we want to use a broadcast in that case?
http://reviews.llvm.org/D5347
More information about the llvm-commits
mailing list