[PATCH] Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores.
Chandler Carruth
chandlerc at gmail.com
Fri Dec 5 03:56:59 PST 2014
Hi hfinkel, majnemer,
Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.
All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.
With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.
For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
integer loads and stores. SSA values are tremendously more powerful
than "copy" intrinsics. Not doing this regresses massive amounts of
LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
SROA or every memcpy of a trivially copyable struct will prevent SSA
formation of the members of that struct. It essentially turns off
SROA.
- The closest alternative is to actually split the loads and stores when
partitioning with SROA, but this has all of the downsides historically
discussed of splitting up loads and stores -- the wide-store
information is fundamentally lost. We would also see performance
regressions for bitfield-heavy code and other places where the
integers aren't really intended to be split.
- We *can* effectively fix this in instcombine, so it isn't that hard of
a choice to make IMO.
http://reviews.llvm.org/D6548
Files:
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
test/Transforms/InstCombine/loadstore-vector.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6548.16983.patch
Type: text/x-patch
Size: 23306 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141205/34f68efa/attachment.bin>
More information about the llvm-commits
mailing list