[PATCH] Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)

Sanjay Patel spatel at rotateright.com
Sun Sep 14 14:05:37 PDT 2014


Hi nadav, chandlerc, andreadb,

Currently, we generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.

The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data.

The change using -Os (function attribute "optsize") for the included testcase file with all 12 AVX2 vector data type cases (f32, f64, i8, i16, i32, i64 for 128-bit and 256-bit vectors) is:
   AVX: +29 inst -112 data = 83 bytes saved
   AVX2: +29 inst -106 data = 77 bytes saved

Note: Is there any optimization pass in LLVM that merges constant pool data from different functions? This could also be done at link time? If that exists, it might change the criteria for generating a broadcast because we might not want to generate extra instructions if the same constant data was loaded multiple times.

http://reviews.llvm.org/D5347

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/splat-for-size.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5347.13686.patch
Type: text/x-patch
Size: 8784 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140914/c2743999/attachment.bin>


More information about the llvm-commits mailing list