Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)
Demikhovsky, Elena
elena.demikhovsky at intel.com
Mon Sep 15 01:00:37 PDT 2014
One more:
> // The v2[f/i]64 case is a mess because there is no VBROADCAST to handle it.
You can take the broadcast to ymm as well.
VBROADCASTSD ymm1,m64
Just add a pattern to the td file.
- Elena
-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Demikhovsky, Elena
Sent: Monday, September 15, 2014 10:31
To: reviews+D5347+public+acd2dacc06d4a03a at reviews.llvm.org; nrotem at apple.com; chandlerc at gmail.com; Andrea_DiBiagio at sn.scee.net
Cc: llvm-commits at cs.uiuc.edu
Subject: RE: Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)
if (ConstSplatVal && (Subtarget->hasAVX2() || OptForSize)) {
EVT CVT = Ld.getValueType();
assert(!CVT.isVector() && "Must not broadcast a vector type");
unsigned Opcode = X86ISD::VBROADCAST; // This only changes for v2[f|i]64.
You can't generate VBROADCAST for SSE. You should check target here..
- Elena
-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Sanjay Patel
Sent: Monday, September 15, 2014 00:06
To: spatel at rotateright.com; nrotem at apple.com; chandlerc at gmail.com; Andrea_DiBiagio at sn.scee.net
Cc: llvm-commits at cs.uiuc.edu
Subject: [PATCH] Use broadcasts to optimize overall size when loading constant splat vectors (x86-64 with AVX or AVX2)
Hi nadav, chandlerc, andreadb,
Currently, we generate broadcast instructions on CPUs with AVX2 to load some constant splat vectors.
This patch should preserve all existing behavior with regular optimization levels, but also use splats whenever possible when optimizing for *size* on any CPU with AVX or AVX2.
The tradeoff is up to 5 extra instruction bytes for the broadcast instruction to save at least 8 bytes (up to 31 bytes) of constant pool data.
The change using -Os (function attribute "optsize") for the included testcase file with all 12 AVX2 vector data type cases (f32, f64, i8, i16, i32, i64 for 128-bit and 256-bit vectors) is:
AVX: +29 inst -112 data = 83 bytes saved
AVX2: +29 inst -106 data = 77 bytes saved
Note: Is there any optimization pass in LLVM that merges constant pool data from different functions? This could also be done at link time? If that exists, it might change the criteria for generating a broadcast because we might not want to generate extra instructions if the same constant data was loaded multiple times.
http://reviews.llvm.org/D5347
Files:
lib/Target/X86/X86ISelLowering.cpp
test/CodeGen/X86/splat-for-size.ll
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-commits
mailing list