[PATCH] Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)

Wed Sep 17 14:41:46 PDT 2014

>>! In D5347#16, @delena wrote:
> Ok, if you want to use VMOVDDUP, you still can do it via pattern in td file. This pattern works perfect:
> +  def : Pat<(v2f64 (X86VBroadcast (loadf64 addr:$src))),
> +            (VMOVDDUPrm addr:$src)>;
> +  def : Pat<(v2i64 (X86VBroadcast (loadi64 addr:$src))),
> +            (VMOVDDUPrm addr:$src)>;
> +

Thanks! That does solve all of the testcases in my splat-for-size test file...it even replaces the vpbroadcastq for v2i64 on AVX2 with a vmovddup which is even better for size.

But that's a problem...according to Intel's optimization guides, when optimizing for speed, we don't want to use vmovddup for v2i64 when AVX2 is available because that's a mismatch between FP and int domains. This also causes the "Q64" test in test/CodeGen/X86/avx2-vbroadcast.ll to fail - it is expecting vpbroadcastq.

Is there a way to use patterns but still distinguish between the conflicting optimization goals of speed and size in that one case? Or just let it slide that vpbroadcastq is an extra byte and always use that instruction for v2i64 with AVX2? (That's what was happening with my patch anyway.)

http://reviews.llvm.org/D5347