[PATCH] D14762: X86-FMA3: Memory folding for scalar loads + FMA3
Vyacheslav Klochkov via llvm-commits
llvm-commits at lists.llvm.org
Tue Nov 17 15:15:19 PST 2015
v_klochkov created this revision.
v_klochkov added a reviewer: DavidKreitzer.
v_klochkov added subscribers: llvm-commits, qcolombet.
Hello,
Please review the patch that enables memory folding optimization for
sequences like this:
#include <immintrin.h>
double mem;
__m128d func(__m128d a, __m128d b) {
__m128d m = _mm_load_sd(&mem);
return _mm_fmadd_sd(a, b, m);
}
Code without the patch (clang -O3 -S):
func: # @func
.cfi_startproc
# BB#0: # %entry
movsd mem(%rip), %xmm2 # xmm2 = mem[0],zero
vfmadd213sd %xmm2, %xmm1, %xmm0
retq
Code with the patch:
func: # @func
.cfi_startproc
# BB#0: # %entry
vfmadd213sd mem(%rip), %xmm1, %xmm0
retq
The load can be folded into 2nd or 3rd operand of FMA*_Int instruction.
The newly added test fma-scalar-memfold.ll checks memory folding for both of operands.
lib/Target/X86/X86InstrFMA.td:
Removed the redundant register to register moves.
Memory folding does not work with those moves.
// TODO: perhaps, the register-to-register moves can be just stripped in such/some cases,
// but that is a separate optimization/change-set.
lib/Target/X86/X86InstrInfo.cpp:
Added the FMA*_Int opcodes to the routine
isNonFoldablePartialRegisterLoad()
test/CodeGen/X86/fma-scalar-memfold.ll:
New test. Checks that result of _mm_load_{s,d}() can be folded into 2nd or 3rd operand of FMA*_Int.
Thank you,
Slava
http://reviews.llvm.org/D14762
Files:
llvm/lib/Target/X86/X86InstrFMA.td
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/test/CodeGen/X86/fma-scalar-memfold.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D14762.40441.patch
Type: text/x-patch
Size: 18565 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151117/31d063b5/attachment.bin>
More information about the llvm-commits
mailing list